adr
BAN USER
- 5of 5 votes
AnswersDesign a voting system. 100M users will be logging in within a window of 24h (not necessarily uniformly). Every user will be able to choose from a fixed list of options. If the user has already voted the system should not let them to vote a second time. Additional constraint: only the first 100K votes are accepted. If the quota is exceeded any attempt to vote should be rejected.
- adr in United States| Report Duplicate | Flag | PURGE
Software Engineer System Design - 0of 0 votes
AnswersYou are standing in the top-left corner of n*m grid. At each step you can only move up, down, right or left. Count the number of unique paths to the bottom-right corner of the grid (paths cannot cross themselves). The interviewer suggested that a backtracking solution is not the most performant one.
- adr| Report Duplicate | Flag | PURGE
Software Engineer
My Python solution:
from heapq import *
def get_sr(ls):
z = ([(v,i) for v in l]
for i,l in enumerate(ls))
n = len(ls)
last_seen = [[None, i] for i in range(n)]
missing = (1 << n) - 1
res = None
for v,i in merge(*z):
last_seen[i][0] = v
if missing & (1 << i):
missing ^= (1 << i)
if not missing:
left = min(last_seen)
dist = v - left[0]
if not res or dist < res[0]:
res = (dist, [left[0], v])
missing |= (1 << left[1])
return res[1]
ls = [[4, 10, 15, 24, 26],
[0, 9, 12, 20] ,
[5, 18, 22, 30]]
print(get_sr(ls))
We can use a priority queue where the key is the head value and the value is a pointer to the corresponding stream. We just need to repeatedly pop the min element, output the key, advance the stream to the next element and push it back to the queue using the new head value.
- adr October 06, 2019An alternative O(n*n) solution would be to preprocess the dictionary into a hash map thus: for every dictionary word w of length q add q words to the hashmap which can be created by removing q-th letter from the word w. Example: for the word 'dog' we add 'og', 'dg', 'do' into the hashmap. Then for the input word we do a similar thing: remove i-th letter and see if the resultant word is in the hashmap. It's quadratic because it takes O(n) to hash the word and we need to do it n times.
D = ['apple', 'apple', 'banana', 'orange']
def preprocess_dict():
hmap = set()
for word in D:
for i in range(len(word)):
hmap.add(word[:i] + word[i+1:])
return hmap
H = preprocess_dict()
def word_exists(word):
for i in range(len(word)):
if (word[:i] + word[i+1:]) in H:
return True
return False
print (word_exists('aPple'))
Preprocess the dictionary into a trie. On every iteration we will have letter z of the input word and set S comprising of pairs (x,r) where x indicates our non-matching character allowance, and r is a trie node. Initially we put (1,R) to S, where R is the trie root node. For every node (x,r) in S and every child node w of r we put (x,w) back to S if w corresponds to z; otherwise we put (x-1,w) provided that x>0. Terminate when an end-of-word child node is encountered meaning we found a match. Complexity: quadratic worst case. This is because on every iteration the increment of the size of S is bounded by constant L (the length of the alphabet), as entries with x==0 can only 'spawn' at most 1 new entry in place of itself and at every point there is only 1 entry with x==1 which can spawn at most L new entries (with x==0) giving us O(n*n*L) = O(n*n).
D = ['apple', 'apple', 'banana', 'orange']
def preprocess_dict():
trie = {}
for word in D:
r = trie
for z in word:
r = r.setdefault(z, {})
r[None] = []
return trie
R = preprocess_dict()
def word_exists(word):
s = [(1,R)]
for z in word:
next_s = []
for (x,w) in s:
if z in w:
next_s += [(x,w[z])]
elif x>0:
next_s += [(x-1,r) for r in w.values()]
s = next_s
return any(None in w.keys() for (x,w) in s)
print (word_exists('applx'))
BFS with multiple sources. Something like that:
M = ['110',
'011']
C,R = len(M[0]), len(M)
def isB(i,j):
return i in [0,R+1] or j in [0,C+1] or M[i-1][j-1] == '0'
# add boundary for simpler checks
m = [[[1,0][isB(i,j)] for j in range(C+2)] for i in range(R+2)]
q = [(1,i,1) for i in range(1,R+1) if m[i][1]]
v = set()
while q:
s,i,j = q.pop()
if (i,j) in v or not m[i][j]:
continue
if j == C:
print(s)
break
v.add((i,j))
q = [(s+1,i-1,j),
(s+1,i+1,j),
(s+1,i,j-1),
(s+1,i,j+1)] + q
I find the problem description a little confusing. It follows from the example that the distance relationship does not have to hold pairwise for all points of a cluster (as someone mentioned that means finding a square that contains the most points which is a harder problem) but rather a simple transitive relationship is defined where a cluster could even be made from points arranged in straight line every 5 units apart in other words points A and B are in the same cluster if they are connected directly (within a square of 5 units of each other) or indirectly via x1 .. xn where (A,x1), (x1,x2), ... (xn, B) are within a square of 5 units of each other. In this case the problem boils down to a simple union-find.
- adr September 22, 2019def lss(special_bits, s, k):
dp = [[(0,-1)]*(len(s)+1)] + [[(0,0)]*(len(s)+1) for i in range(k+1)]
res = 0
for i,x in enumerate(s):
is_normal = special_bits[ord(x)-ord('a')] == '0'
is_special = 1-is_normal
for j in range(k+1):
normal_count, max_len = dp[j+is_special][i]
dp[j+1][i+1] = (normal_count+is_normal, max_len+1)
res = max(res, max_len+1)
return res
print (lss('01111001111111111011111111', 'giraffe', 1))
I am confused by your example. The bit mask '011110011111111110111111' is 2 characters short. Presuming you meant '01111001111111111011111111' which you mentioned the first time, the normal characters would be ['a', 'f', 'g', 'r'], however you say that 'gir' has only one normal character which does not make sense.
- adr September 22, 2019Assuming the card array is not already sorted..
def buy_cards(cards, cash):
if cards:
cards = sorted(cards)
cards = [0] + cards + [cards[-1] + cash]
res = 0
for n,m in zip(cards, cards[1:]):
if not cash:
break
d = int((-2*n - 3 + ((2*n+3)**2 + 8*(cash-n-1))**0.5)//2)
if d >= 0:
d = min(d, m-n-2)
res += d+1
cost = (d*d + d*(2*n+3) + 2*(n+1))//2
cash -= cost
return res
print(buy_cards([2,3,5], 7))
print(buy_cards([2,3,5,11], 50))
Output:
2
7
Looks like Integer linear programming problem to me. I guess the interviewer expects you to use DP here. However an alternative and more fun approach would be to use a specialized tool like z3 in Python where this kind of problems are easy to express:
from z3 import *
X = 3
Y = 2
G = [1,2,1]
K = [3,4]
a = [[Int('a_{}_{}'.format(i,j)) for j in range(Y)] for i in range(X)]
s = Optimize()
s.add(And([a[i][j] >= G[i] for i in range(X) for j in range(Y)]))
s.add(And([a[i][j] <= K[j] for i in range(X) for j in range(Y)]))
for j in range(Y):
s.add(Or([a[i][j] == K[j] for i in range(X)]))
for i in range(X):
s.add(Or([a[i][j] == G[i] for j in range(Y)]))
s.minimize(Sum([a[i][j] for i in range(X) for j in range(Y)]))
if s.check() == unsat:
print (-1)
else:
print (simplify(sum([s.model()[a[i][j]] for j in range(Y) for i in range(X)])))
print ([[s.model()[a[i][j]] for j in range(Y)] for i in range(X)])
Output:
12
[[3, 1], [2, 4], [1, 1]]
MAX_FLIPS = 1
START,END = (0,0),(2,1)
INP = ["101",
"101",
"101",
"111"]
N,M = len(INP[0]), len(INP)
cells = [(0,0)]*(N+2)*(M+2) # add border for simpler checks
for i,r in enumerate(INP):
for j,x in enumerate(r):
cells[(i+1)*(N+2)+j+1] = (int(x),0) # (cell_val, cell_state)
# i-bit set in cell_state means the cell was visited after i flipps
starti = START[0]+1 + (START[1]+1)*(N+2)
endi = END[0]+1 + (END[1]+1)*(N+2)
q = [(starti, 0, 0)] # (pos, steps, flips)
while q:
(pos, steps, flips) = q.pop()
cell_val, cell_state = cells[pos]
if (pos % (N+2)) in [0, N+1] or (pos / (N+2)) in [0, M+1]: # skip border
continue
if cell_state & ((1 << (flips+1)) - 1) or flips == MAX_FLIPS and cell_val == 0:
continue
flips += (1-cell_val)
cell_state |= (1 << flips)
if pos == endi:
print (steps)
break
cells[pos] = (cell_val, cell_state)
q = [(pos-1, steps+1, flips),
(pos+1, steps+1, flips),
(pos-N-2, steps+1, flips),
(pos+N+2, steps+1, flips)] + q
Basically this is a DP problem. g counts palindromes, f counts special palindromes..
N = 3
K = 3
g = [1,K] + [0]*(N-1)
f = [1,K] + [0]*(N-1)
for i in range(2,N+1):
z = (i+1)//2
g[i] = K*g[i-2]
f[i] = g[i]
for j in range(2,z+1):
f[i] -= f[j] * g[max(0, i-2*j)]
print (f[N])
A straightforward DP problem
N = 10
dp = [[[1] * (N+1) for _ in [0,1]] for _ in [0,1,2]]
for l in range(1,N+1):
dp[2][1][l] = dp[2][1][l-1] + dp[2][0][l-1] + dp[1][1][l-1]
dp[1][1][l] = dp[1][1][l-1] + dp[1][0][l-1] + dp[0][1][l-1]
dp[0][1][l] = dp[0][1][l-1] + dp[0][0][l-1]
dp[2][0][l] = dp[2][0][l-1] + dp[1][0][l-1]
dp[1][0][l] = dp[1][0][l-1] + dp[0][0][l-1]
print (dp[2][1][N])
If you need to compute a maximal subset where adding a new tower forces the subset to have an overlap with it then this is a Maximal Independent Set (MIS) problem. If on the other hand you need to compute a largest such subset then this is a Maximum Independent Subset problem (MaxIS).
- adr September 07, 2019We could use dynamic programming. Something like this:
def count(s,k):
dp1 = [[0]*(len(s)+1)
for _ in range(k+1)]
dp2 = [[0]*(len(s)+1)
for _ in range(k+1)]
for i in range(len(s)):
for j in range(i+1):
z = s[j:i+1]
p = [0,1][z == z[::-1]]
if j == 0:
dp1[0][i+1] = p
dp2[0][i+1] = 1
for c in range(1, min(k,j)+1):
w = dp1[c-1][j]
y = dp2[c-1][j]
dp1[c][i+1] += w + p*y
dp2[c][i+1] += y
return dp1[k][len(s)]
print count ('aabbc', 2)
Generate next element in the sequence based on the previous element. This way we don't need to store the sequence and meet the memory requirements.
def find(query, n):
ans = []
query = sorted(query)
qi = 0
z = 1
for zi in xrange(n):
if query[qi] == zi:
ans += [z]
qi += 1
if qi == len(query):
break
if z*10 <= n:
z *= 10
elif z%10 != 9 and z+1 <= n:
z += 1
else:
z /= 10
while z%10 == 9:
z /= 10
z += 1
return ans
print find([1,4], 12)
def find(cities, n):
memo = {}
def helper(cs, n):
if len(cs) == n:
return 1
tcs = tuple (cs)
if tcs in memo:
return memo.get(tcs)
z = [cs[0]-2]+cs+[cs[-1]+2]
ans = 0
for i,(a,b) in enumerate(zip(z,z[1:])):
if 0<a+1<b<n+2:
ans += helper(cs[:i]+[a+1]+cs[i:], n)
if 0<a<b-2:
ans += helper(cs[:i]+[b-1]+cs[i:], n)
memo[tcs] = ans
return ans
return helper(sorted(cities), n) % (10**9+7)
print find([2,5],5)
Binary search. Similar to painter’s partition problem.
def find(ts, n):
assert len(ts) >= n
lo = max(ts)
hi = sum(ts)+1
while lo<hi:
m = (lo+hi)/2
t_left = m
boxes_left = n-1
for t in ts:
assert t <= m
if t_left < t:
if boxes_left == 0:
lo = m+1
break
boxes_left -= 1
t_left = m-t
else:
t_left -= t
else:
hi = m
arrangement,s = [[]],0
for t in ts:
if s + t > lo:
s = 0
arrangement += [[]]
s += t
arrangement[-1] += [t]
return lo, arrangement
We are matching a 4-place predicate so the bruteforce is O(n^4). If anything, this resembles a 4-sum problem, definetely not a 3-sum as some have claimed.
The important thing to note from the description is that 3 elements matching the value of the 4th element must lie to the left of the 4th element. This lends itself to this straightforward approach:
def count(arr):
ans = 0
sums = defaultdict(int)
for i,x in enumerate(arr[3:], 3):
c = arr[i-1]
for j,a in enumerate(arr[:i-2]):
for b in arr[j+1:i-1]:
sums[a+b+c] += 1
ans += sums[x]
return ans
O(N^3) runtime.
- adr October 11, 2018Sort one of the arrays of length N. Iterate the other array of length M and do a binary search in the first array updating the global maximum. O(N*log(N) + M*log(N))
def find(F, B, T):
ans = [0, 0, 0]
F = sorted([x, i] for i,x in F)
for idy,y in B:
f = 0
end = len(F)
z = T - y
while f != end:
m = (f + end)/2
if F[m][0] <= z:
f = m+1
else:
end = m
if f != 0 and y+F[f-1][0] > ans[0]:
ans = [y+F[f-1][0], F[f-1][1], idy]
return ans[1:]
print find([(1,3000),(2,5000),(3,4000),(4,10000)],
[(1,2000),(2,3000),(3,4000)], 11000)
A greedy approach can work here. The idea is that we keep track of how many incomplete stacks there are to the left of position i (shortage array). Note that shortage[-1] - shortage[i] also gives us the information on the number of incomplete stacks to the right of position i. Then for every stack where we have some extra coins (>1) we calculate how many needs to be moved left / right based on our knowledge of incomplete stacks:
def findMinMovesToRedistributeCoins(coins):
shortage = [0 for _ in coins]
missing = 0
for i,x in enumerate(coins):
if not x:
missing += 1
shortage[i] = missing
ans = 0
while shortage[-1]:
for i,x in enumerate(coins):
extra = x - 1
if extra > 0 and i > 0 and shortage[i-1] > 0:
# move stack left
d = min(extra, shortage[i-1])
coins[i] -= d
if not coins[i-1]:
for j in range(i-1, len(coins)):
shortage[j] -= 1
coins[i-1] += d
# print coins
ans += 1
extra -= d
if extra > 0 and i+1<len(coins) and shortage[-1] != shortage[i]:
# move stack right
d = min(extra, shortage[-1] - shortage[i])
coins[i] -= d
if not coins[i+1]:
for j in range(i+1, len(coins)):
shortage[j] -= 1
coins[i+1] += d
# print coins
ans += 1
return ans
Test driver:
print findMinMovesToRedistributeCoins([0, 2])
[1, 1]
1
print findMinMovesToRedistributeCoins([0, 3, 1, 0, 3, 0, 0])
[1, 2, 1, 0, 3, 0, 0]
[1, 1, 2, 0, 3, 0, 0]
[1, 1, 1, 1, 3, 0, 0]
[1, 1, 1, 1, 1, 2, 0]
[1, 1, 1, 1, 1, 1, 1]
5
We don't need to know the length of the input streams, just a way to test if the stream is empty. Here is a C++ example to merge heterogeneous streams of arbitrary length:
#include <queue>
#include <iostream>
#include <algorithm>
#include <memory>
#include <iterator>
using namespace std;
class helper_base
{
public:
virtual ~helper_base() {}
virtual void next() = 0;
virtual int get() const = 0;
virtual bool eof() const = 0;
};
template <typename InputIterator>
class helper : public helper_base
{
InputIterator d_in;
InputIterator d_end;
public:
explicit helper(const pair<InputIterator, InputIterator>& range)
: d_in(range.first), d_end(range.second)
{}
void next() override { ++d_in; }
int get() const override { return *d_in; }
bool eof() const override { return d_in == d_end; }
};
template <typename ... InputIteratorPairs, typename OutputIterator>
void merge(OutputIterator out, InputIteratorPairs ... inpairs)
{
vector<shared_ptr<helper_base>> arr =
{make_shared<helper<decltype(inpairs.first)>>(inpairs)...};
arr.erase(remove_if(arr.begin(), arr.end(),
[](shared_ptr<helper_base> stream) { return stream->eof(); }),
arr.end());
auto cmp = [](shared_ptr<helper_base> lh, shared_ptr<helper_base> rh) {
return lh->get() > rh->get();
};
priority_queue<shared_ptr<helper_base>,
vector<shared_ptr<helper_base>>,
decltype(cmp)> pq(arr.begin(), arr.end(), cmp);
while (!pq.empty()) {
auto top = pq.top();
pq.pop();
*out++ = top->get();
top->next();
if (!top->eof()) {
pq.push(top);
}
}
}
int main()
{
const int a[] = {1, 3, 4, 5, 6, 7};
const vector<int> b = { 2, 4, 5, 9};
// merge a,b and stdin stream
merge(ostream_iterator<int>(cout, " "),
make_pair(a, a+sizeof(a)/sizeof(int)),
make_pair(b.begin(), b.end()),
make_pair(istream_iterator<int>(cin), istream_iterator<int>()));
return 0;
}
OK here's some code. It can be further optimised to not use naive union-find and to not brute-force when searching for overlapping areas. Hope it helps.
# Assume all sensors are within a room, the actual width of the room does not matter.
def canGoFromLeftToRight(roomHeight, sensors, r):
ids = range(len(sensors))
def union(i,j):
ids[find(i)] = find(j)
def find(i):
while (i != ids[i]):
ids[i] = ids[ids[i]]
i = ids[i]
return i
top = []
bottom = []
for i,[x,y] in enumerate(sensors):
if y+r >= roomHeight: # overlaps top side of the room
top += [i]
if y <= r: # overlaps bottom side of the room
bottom += [i]
if not top or not bottom:
return True
# unite all sensors overlapping the top
for i,j in zip(top, top[1:]):
union(j,i)
# unite all sensors overlapping the bottom
for i,j in zip(bottom, bottom[1:]):
union(i,j)
# unite all sensors overlapping each other
for i,[x,y] in enumerate(sensors):
for I,[X,Y] in enumerate(sensors[i+1:],i+1):
if (X-x)*(X-x) + (Y-y)*(Y-y) <= 4*r*r:
union(i,I)
return find(top[0]) != find(bottom[0])
canGoFromLeftToRight(1, [(0,0),(0.5,0.2),(0.7,0.4),(0.6,0.6),(1,1)], 0.5) # False
canGoFromLeftToRight(1, [(0,0),(0.5,0.2),(0.7,0.4),(1,1)], 0.5) # True
This is a percolation problem solvable using union-find.
The thief cannot go from left to right iff there exists a path from top to bottom via sensor coverage areas. You have a graph where two sensors are connected if their coverage areas overlap. You need to find if there exists a sensor overlapping the top side of the room that is connected with a sensor overlapping the bottom side of the room.
Solution 1: min-priority queue where the key for each point is the distance to (0,0). Heapify the input array and pop first K. Complexity: O(n+k*log(n))
Solution 2: use a selection algorithm like quick select to get Kth closest element. Complexity: O(n) average, O(n*n) worst case. You'll get a partially sorted array where Kth element is on it's place and all elements to the left are closer to (0,0) or same distance. Note that the complexity of this method does not even depend on K.
from collections import Counter
def count(arr):
fs = [(tuple(Counter(x[::2]).items()),
(tuple(Counter(x[1::2]).items())))
for x in arr]
return sum(z for (x,z) in Counter(fs).items() if z>1)
print count(["abcd","cbad","bacd"])
print count(["abcd", "acbd", "adcb", "cdba",
"bcda", "badc"])
@robb.krakow Sure. This has some similarities with union-find. Use an array to store for N elements which of the two partitions they belong to with 0 meaning unmarked/undecided. When you see a rule (a,b) you look up a,b in the array. If both of them are unmarked, i.e. rel[a] == rel[b] (we assume a != b and they cannot be already marked as belonging to the same partition as the rules are non-contradictary), we mark them as belonging to the opposite partitions. Otherwise identify which of a,b is marked and mark the other one as belonging to the opposite partition.
- adr July 27, 2018Assuming the set of rules is complete and non-contradictary, i.e after applying the rules every element can be in exactly one of the two partitions:
def part(n, k):
rel = [0]*n
for a,b in k:
if rel[a-1] == rel[b-1]:
rel[a-1] = 1
rel[b-1] = -1
else:
if rel[b-1]==0:
a,b = b,a
rel[a-1] = -rel[b-1]
r1 = []
r2 = []
for i in xrange(n):
r = r1 if rel[i] == 1 else r2
r += [i+1]
return (r1,r2)
You only need to know P and K. The list of N integers is irrelevant to the problem.
The first part is to factor P into primes. Then, if all the elements in the resultant list of prime factors are unique, there is a simple combinatorical solution: 1+sum_i=1..K-1 (C(i, L-1)), where C(k,n) = n!/k!(n-k)! and L is length of the list of primes.
I find the general case hard though. When we have duplicates in the list of primes (think of [2, 2, 2, 2, 2] where [2,2],[2,2,2] and [2,2,2],[2,2] result in the same terms, 4*8 = 8*4) I can't think of any better way than to iterate over all 2-way, 3-way, .. K-way splits of the list of prime factors, compute the resultant product terms, and count ignoring repetitions.
The idea is to accommodate "cheapest" tasks first using a cpu with the least free cores.
def count(tasks, cpus):
tasks = sorted(tasks)
heapq.heapify(cpus)
res = 0
for t in tasks:
while cpus and cpus[0] < t:
heapq.heappop(cpus)
if not cpus:
return res
cores = cpus[0]
res += 1
heapq.heapreplace(cpus, cores - t)
return res
Similar to @Flint solution but a bit shorter, O(len(seq) * (sum(seq) - magic_number)):
def is_magic(seq, magic_number):
if not seq:
return False
magic_number -= seq[0] # The first element must be positive
seq = seq[1:]
target = sum(seq) - magic_number
if (target % 2) or (target < 0):
return False
target /= 2
return reduce(lambda w, i: [w[m] or (w[m-seq[i]] if seq[i]<= m else False) for m in xrange(target+1)],
xrange(1, len(seq)),
[True]+[False if i+1 != seq[0] else True for i in xrange(target)])[-1]
Assuming you are on a 64-bit system where practically any file on your hard drive can fit in your virtual address space, you just do an external sort, mmap the resultant file and then do a usual 3sum with 3 pointers. OS will take care of (un)loading pages to/from real memory.
- adr July 20, 2018The enabling logic should be straightforward provided that you have ample memory. You could use a hash map for the counters id -> (timestamp, count). If someone makes a request you check if there is an entry. If there isn't or if the timestamp in the entry is less than the current timestamp-X, add/replace it with (timestamp, 0). Otherwise, if the counter is less than Y, replace the entry with (timestamp, count+1). Otherwise, block the request. You will also want to limit the size of the map if the range of ids is too big by evicting old values. Different strategies could be used like having a priority queue timestamp -> id where the top element in the queue is the oldest. Before making checks against the hash map you could repeatedly pop elements off the queue if they are too old and remove the corresponding elements from the hash map.
- adr July 18, 2018So, if I understand the problem description, you are given (timestamp, id) data stream and asked to compute the most frequent ids falling into the specified time window?
This is a well known problem. We need to know the size of the data stream that fits in the biggest time window of interest as compared to the available memory. If you need to compute the exact top-K heavy hitters it will require O(N) memory (it's proved that even exact top-1 requires O(N) space worst-case), where N is the biggest number of data points fitting into the time window. If you can satisfy this requirement, then it's one thing and the solution is relatively easy. On the other hand (I suspect this is what the interviewer wanted to hear) if the size of the stream is too huge and you can tolerate errors, then it calls for an approximation solution like lossy counting etc. see https://en.m.wikipedia.org/wiki/Streaming_algorithm#Frequent_elements
See also here https://cstheory.stackexchange.com/questions/19802/top-k-frequent-items-in-data-stream
@Flint The problem can be rephrased as such: your input is an integer K>0, and a sequence S = s1 s2 .. sn, 1<=si<=K, 1<=i<=n. Consider a set Q of all possible sequences q1 q2 .. qz, 1 <= qi <= K, z >= 1 that are not subsequences of S (that cannot be derived from S by deleting some or no elements without changing the order of the remaining elements). You need to find the length of the shortest sequence in Q.
- adr July 15, 2018@aonecoding provided a cool solution. It took me a while to convince myself that it actually works. Here is the sketch of a proof if you are interested:
1) If an input string S does not contain all K characters, then the answer is obviously 1, as any missing character forms a string that is not a substring of S
2) Otherwise, an input string S can be represented as a concatenation S1 S2 where
S1 is the *shortest* prefix of S that contains all K characters and (a possibly empty) suffix S2.
Now, if the suffix S2 contains all K characters it can also be represented as a concatenation of a shortest prefix that contains all K characters and (a possibly empty) suffix.
Repeating this process until we get a suffix that does not contain all K characters, an input string S can uniquely be represented as a concatenation of S1 S2 .. Sn where Si (i in [1..n-1]) is the shortest prefix of the rest of S (to be more precise, of the suffix of S produced by taking S1 S2 .. Si-1 prefix off) containing all K characters, and Sn - is the (possibly empty) suffix that does not contain all K characters (meaning some of K characters are not in Sn).
3) Since Si is the shortest prefix with all K characters, its last character occurs only once (otherwise this character could be omitted producing a shorter prefix with all K characters).
4) This is how to build a string that is guaranteed to not be a sub-sequence of S and also having minimum length among other strings having this property, meaning that any shorter string is necessarily a sub-sequence of S: s1 s2 .. sn, where si (i in [1..n-1]) is the last character of Si, and sn is any of the K characters not present in Sn.
First, let's prove that any shorter sequence q1 q2 .. qk, k<n is a sub-sequence of S. This part is obvious since qi is in Si, since Si contains all K characters (i in [1..k]).
Now, let's prove that the string s1 s2 .. sn is not a sub-sequence of S. Let's assume otherwise. Since S1 contains s1 only as the last character, either the whole s1 s2 .. sn is a subsequence of S2 .. Sn, or just s2 s3 .. sn is a subsequence of S2 .. Sn. Either way, s2 s3 .. sn is a subsequence of S2 S3 .. Sn. Now, since S2 contains s2 only as the last character, s3 .. sn is a subsequence of S3 .. Sn. By repeating this n-1 times we get that sn is a subsequence of Sn, but this cannot be true because sn was specifically chosen as one of the K characters not in Sn. We get a contradiction which means s1 s2 .. sn is not a subsequence of S.
Now, having proved that s1 .. sn is the minimal string that is not a subsequence of S, we see that in order to compute n we just need to break up the input string S into concatenations S1 .. Sn-1 Sn, as outlined in 2) and count the number of these. This is exactly what the algorithm is doing.
Sounds like you only need 10s worth of data for your check so you could use a priority queue where the key is the timestamp and the value is (word, timestamp). The top element in the queue is the one with the smallest timestamp (oldest). So every time the iterator is used it could repeatedly remove the top element (if it is too old) both from the queue and from the hash map until the oldest stored element's timestamp is not smaller than the current timestamp - 10s.
- adr July 05, 2018
Repjohnsantana9ytt9, Backend Developer at Accolite software
I am working as a Support service manager at Pro Star Garden Management . I had a different experience while working ...
Rephuslafenena, Backend Developer at Absolute Softech Ltd
I am working as a Teletype operator at Midwest TV & Appliance . It's been almost ten years since I've ...
Repathenajbarnarda, Android Engineer at ABC TECH SUPPORT
I am a graduate in Civil Engineering with nearly 7 years of experience in planning and implementing technical solutions for ...
RepSaanviJones, abc at 8x8
I am working as an office worker in a softage company. I am responsible for an office assistant position in ...
RepLicholsLowry, Associate at AMD
I am Lichols, I am an outreach worker in Desmonds Formal Wear company .I specialise in a variety of different ...
Repericsumm059, Backend Developer at ASU
My name is EricSummey . I am working as a Sound engineering technician at Foreman & Clark . as a sound technician it ...
RepEileenWenda, DIGITAL MARKETING at Accenture
I am Eileen , a football Referee skilled at maintaining a safe environment for both players and observers, inspecting the playing ...
Repwoodscarla116, Associate at ASAPInfosystemsPvtLtd
I am a nurse . My name is CarlaWoods . I am working as a Clinical social worker. I met many people ...
Reprginiarist, DIGITAL MARKETING at Accenture
I am Rginia Credentialing Specialists ensure that medical staff members' maintain current credentials and licenses to work legally in their ...
Repvickidsmithv, Android Engineer at 247quickbookshelp
Hello, I am a Managing editor. I have completed my studies from New York. Apart from this, whenever I am ...
Construct a trie based on rules. If while processing the end of the rule we discover that a different value was already inserted for the node we abandon rule processing and print 'ambiguous'.
- adr October 25, 2019