Given a paragraph of text, wri

Amazon Interview Question for Developer Program Engineers

0

of 0 votes

9
Answers
Given a paragraph of text, write a program to find the first shortest sub-segment that contains each of the given k words at least once. A segment is said to be shorter than other if it contains less number of words
- sameer.careercup April 01, 2014 in India | Report Duplicate | Flag | PURGE
Amazon Developer Program Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

1) Create HashMap <String, Index> which maps a word to its latest position in the text.
2) Create priority Queue which contains indexes of the words which have been already found.
(Priority queue stores sorted elements so first element in the queue is the beginning of the segment.
3) Every time a matching word is found on the new position update queue:
- remove previous position and add new one
4) If all words have been found check if segment is smaller then the minimum.
Java:

public static void findSegment(String[] text, String[] dict) {
        Map<String, Integer> wordToIndexMap = new HashMap<String, Integer>();
        Queue<Integer> positions = new PriorityQueue<Integer>();
        int missingWords = dict.length;
        int minDist = Integer.MAX_VALUE;
        int startIndex = -1;
        int endIndex = -1;
        for (int i = 0; i < dict.length; i++) {
            wordToIndexMap.put(dict[i], null);
        }
        for (int i = 0; i < text.length; i++) {
            if (!wordToIndexMap.containsKey(text[i])) {
                continue;
            }
            if (missingWords > 0 && wordToIndexMap.get(text[i]) == null) {
                missingWords--;
            }
            if (wordToIndexMap.get(text[i]) != null) {
                positions.remove(wordToIndexMap.get(text[i]));
            }
            wordToIndexMap.put(text[i], i);
            positions.add(i);
            if (missingWords == 0 && (i - positions.peek() + 1 < minDist)) {
                minDist = i - positions.peek() + 1;
                startIndex = positions.peek();
                endIndex = i;
            }
        }
        if (missingWords == 0) {
            System.out.println("Segment starts: " + startIndex + ", ends: " + endIndex);
        } else {
            System.out.println("Segment not found");
        }
    }

- hulkthedestroyer April 02, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

"First" and "Shortest" are two different things.

Let N = length of paragraph and W = length of word list
The following is an O(N) time and O(W) space solution

The idea is that you keep two pointers to the beginning and end of the window. Also, keep a list of the number of occurrences of each of the given words in the window.

A) Increase the end-pointer and each time, if the newly seen word belongs to our list, add its counter
B) If you have seen all the required pairs, report the window (add it to "pairs")
C) Increase the begin pointer under either of the following conditions:
C-1) The word begin-pointer refers to does not belong to the list
C-2) The word belongs to the list but it has occurred more than once in the window (makes the window shortest possible)

Continue until you reach last.

Complexity analysis: O(W) space to remember the count and O(N) time for the while loop.
To see the O(N) time note that each word in paragraph is visited "at most" twice, once by end-pointer and once by begin-pointer.

words = ['list', 'of', 'words'];
string = 'this part of text that includes words and lists but not list of words and if you need a list which contains many more words you should try a longer list of terms or list of words'
para = string.split(' ');

count_words = dict()
for w in words:
    count_words[w] = 0

pos_begin = 0
pos_end = 0
not_seen = len(words)

pairs = []
while pos_end < len(para):
    if para[pos_end] in words:
        if count_words[para[pos_end]] == 0:
            not_seen -= 1
        count_words[para[pos_end]] += 1
    
    
    while pos_begin < pos_end:
        if para[pos_begin] in words:
            if count_words[para[pos_begin]] == 1:
                break
            else:
                count_words[para[pos_begin]] -= 1
        pos_begin += 1
    
    # We might have found a shorted window
    if not_seen == 0:
        pairs.append([pos_begin, pos_end])
    
    pos_end += 1

print pairs

- Ehsan April 01, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

this seems it will return first always i need smallest subpara in the whole. hope i am able to clarify.

- sameer.careercup April 01, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It returns all pairs. Add this to get the min:

min_val = len(para)
 	window = [0, min_val - 1];
	for k in range(0, len(pairs)):
        	if pairs[k][1] - pairs[k][0] < min_val:
			min_val = pairs[k][1] - pairs[k][0]
			window = pairs[k]

- Ehsan April 01, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How is this solution O(W)? when you split the string, you will get an array of all the words in the paragraph. This is (n- (spaces)). which is O(N). right?

- Anonymous April 01, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I should say O(W) extra memory. The paragraph is the input anyway. I mean, I assumed it is the input to the function.
In case it is being streamed in, we need a buffer and O(N) space.

- Ehsan April 02, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Question about this statement you are making for a condition to advance the begin pointer :

"The word belongs to the list but it has occurred more than once in the window (makes the window shortest possible) ".

That logic seems flawed in that you might inadvertently skip over another "found" word in the list. So in a sentence like

"Is this the first time I have seen the word first or is it the second"

where the target words are "first | seen | second", I think your logic would improperly advance the begin pointer when you run into the second occurrence of the word "first"... skipping right past "seen" and returning an false shortest window.

- johny418 April 02, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Thanks for the comment.
You only skip over a word if
A) it does not belong to the list
B) it belongs to the list but it has been recorded more than once.

In your example if "first" has been recorded multiple times, then we skip over it. But when we reach "seen", we will check if another word "seen" has been spotted before (by the end-pointer at some previous iteration). If so, the count has to be > 1 and we can skip over it. Otherwise, the loop breaks and "begin-pointer" will not increase.
Note that I only count a word when the end-pointer reaches one.

- Ehsan April 02, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

leetcode.com/2010/11/finding-minimum-window-in-s-which.html...modify the solution for words instead of chars

- Nitin April 02, 2014 | Flag Reply

CareerCup

Amazon Interview Question for Developer Program Engineers

Books

Videos

Resume Review

Mock Interviews