Amazon Interview Question
Developer Program EngineersCountry: India
Interview Type: In-Person
"First" and "Shortest" are two different things.
Let N = length of paragraph and W = length of word list
The following is an O(N) time and O(W) space solution
The idea is that you keep two pointers to the beginning and end of the window. Also, keep a list of the number of occurrences of each of the given words in the window.
A) Increase the end-pointer and each time, if the newly seen word belongs to our list, add its counter
B) If you have seen all the required pairs, report the window (add it to "pairs")
C) Increase the begin pointer under either of the following conditions:
C-1) The word begin-pointer refers to does not belong to the list
C-2) The word belongs to the list but it has occurred more than once in the window (makes the window shortest possible)
Continue until you reach last.
Complexity analysis: O(W) space to remember the count and O(N) time for the while loop.
To see the O(N) time note that each word in paragraph is visited "at most" twice, once by end-pointer and once by begin-pointer.
words = ['list', 'of', 'words'];
string = 'this part of text that includes words and lists but not list of words and if you need a list which contains many more words you should try a longer list of terms or list of words'
para = string.split(' ');
count_words = dict()
for w in words:
count_words[w] = 0
pos_begin = 0
pos_end = 0
not_seen = len(words)
pairs = []
while pos_end < len(para):
if para[pos_end] in words:
if count_words[para[pos_end]] == 0:
not_seen -= 1
count_words[para[pos_end]] += 1
while pos_begin < pos_end:
if para[pos_begin] in words:
if count_words[para[pos_begin]] == 1:
break
else:
count_words[para[pos_begin]] -= 1
pos_begin += 1
# We might have found a shorted window
if not_seen == 0:
pairs.append([pos_begin, pos_end])
pos_end += 1
print pairs
this seems it will return first always i need smallest subpara in the whole. hope i am able to clarify.
It returns all pairs. Add this to get the min:
min_val = len(para)
window = [0, min_val - 1];
for k in range(0, len(pairs)):
if pairs[k][1] - pairs[k][0] < min_val:
min_val = pairs[k][1] - pairs[k][0]
window = pairs[k]
How is this solution O(W)? when you split the string, you will get an array of all the words in the paragraph. This is (n- (spaces)). which is O(N). right?
I should say O(W) extra memory. The paragraph is the input anyway. I mean, I assumed it is the input to the function.
In case it is being streamed in, we need a buffer and O(N) space.
Question about this statement you are making for a condition to advance the begin pointer :
"The word belongs to the list but it has occurred more than once in the window (makes the window shortest possible) ".
That logic seems flawed in that you might inadvertently skip over another "found" word in the list. So in a sentence like
"Is this the first time I have seen the word first or is it the second"
where the target words are "first | seen | second", I think your logic would improperly advance the begin pointer when you run into the second occurrence of the word "first"... skipping right past "seen" and returning an false shortest window.
Thanks for the comment.
You only skip over a word if
A) it does not belong to the list
B) it belongs to the list but it has been recorded more than once.
In your example if "first" has been recorded multiple times, then we skip over it. But when we reach "seen", we will check if another word "seen" has been spotted before (by the end-pointer at some previous iteration). If so, the count has to be > 1 and we can skip over it. Otherwise, the loop breaks and "begin-pointer" will not increase.
Note that I only count a word when the end-pointer reaches one.
1) Create HashMap <String, Index> which maps a word to its latest position in the text.
2) Create priority Queue which contains indexes of the words which have been already found.
(Priority queue stores sorted elements so first element in the queue is the beginning of the segment.
3) Every time a matching word is found on the new position update queue:
- remove previous position and add new one
4) If all words have been found check if segment is smaller then the minimum.
Java:
- hulkthedestroyer April 02, 2014