## Amazon Interview Question for Software Engineer / Developers

Team: Kindle
Country: United States
Interview Type: In-Person

Comment hidden because of low score. Click to expand.
10
of 10 vote

Actuslly the interviewer was right that it is not optimal.
The reason is that you have to hash a word (first pass) then after lookup compare it with the already existing one (second pass).

The optimal solution is to use a tire and add the words to it. When you reach the end leaf, increase the counter and compare it to the current top 3 (to the last, least popular item). If it is already in top 3, the just increase the arrays value and check with the next item. If its larger, swap them.

The 3 topmost array can be substituted with a linked list which always drops the last item but that requires a lot of garbage collection so a fixed array is favorable.

Generally, hash tables are really good but keep in mind that their average collisioon rate with strings are high and they require resizing all the time.

On the other hand, tries are static, no need to resize them.

Comment hidden because of low score. Click to expand.
0

Two pass not required, if we maintain current top 3 as we go through the book hashing words. Also inside the hash map every word is also equipped with its word count.

Comment hidden because of low score. Click to expand.
0

.. where by 'tire' you mean 'trie', right? Also, you don't have to have any particular structure to keep the top 3, they could be just separate variables. But, probably an array of 3 is more convenient.

Comment hidden because of low score. Click to expand.
0

Min heap of size 3 should help to track top 3.

Comment hidden because of low score. Click to expand.
0

Min heap of size 3 should help to track top 3.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Not optimal in which sense?

Theorem:
Every correct algorithm that solves this problem is Omega(N).

Proof:
Assume not. That is, assume there exists an algorithm O* such that, for every input I, O* finds the top 3 frequently used words in a text file in a time asymptotically faster than O(N).

Then, we can conclude that for every input I, O* does not check every single word in the file; that is, it is capable of deciding the top 3 property without looking at every single word in the input. At the very least, there exists one word which hasn't been considered; thus, it is easy to build an instance K where each word is identically distributed, and that single word that hasn't been checked is the tie-breaker, essentially causing the algorithm to produce incorrect results without considering such a word, contradicting our assumption that O* is correct.

What was the running time of your algorithm?
And what about map-reduce? Wouldn't this be a good fit?

Comment hidden because of low score. Click to expand.
0

can u explain it clearly??

Comment hidden because of low score. Click to expand.
0

You are correct I.E in the sense that you are only considering time complexity. The hashing technique would give exponential, O(26^N), space complexity which is clearly unacceptable. In fact a two-trie is better than even a trie for this case as it has the time complexity of a trie but space complexity better than a trie's.

Comment hidden because of low score. Click to expand.
0
of 2 vote

- Use unordered hash map + min heap

- Use ordered hash map (like STL map which uses balanced trees)

Comment hidden because of low score. Click to expand.
0
of 0 vote

- Use unordered hash map + min heap

- Use ordered hash map (like STL map which uses balanced trees)

Comment hidden because of low score. Click to expand.
0
of 0 vote

Use a max heap

Comment hidden because of low score. Click to expand.
0
of 0 vote

Use a max heap

Comment hidden because of low score. Click to expand.
0
of 0 vote

One possible solution:
Sort the file and which is O(N*log(N)) and now with a single pass we can find out the top 3 words

Comment hidden because of low score. Click to expand.
0
of 0 vote

-use list of pairs<string,int>, int argument contains the number of particular word.Now sort using the second argument.

Comment hidden because of low score. Click to expand.
0
of 0 vote

-use list of pairs<string,int>, int argument contains the number of particular word.Now sort using the second argument.

Comment hidden because of low score. Click to expand.
0

It doesn't matter if you do one or two passes over the data; they're both constants, so the interviewer is NOT correct.

Both O(2N) and O(N) are O(N).

The solution the OP provided is asymptotically as good as the so called 'optimal' solution by the interviewer.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Actuslly the interviewer was right that it is not optimal.
The reason is that you have to hash a word (first pass) then after lookup compare it with the already existing one (second pass).

The optimal solution is to use a tire and add the words to it. When you reach the end leaf, increase the counter and compare it to the current top 3 (to the last, least popular item). If it is already in top 3, the just increase the arrays value and check with the next item. If its larger, swap them.

The 3 topmost array can be substituted with a linked list which always drops the last item but that requires a lot of garbage collection so a fixed array is favorable.

Generally, hash tables are really good but keep in mind that their average collisioon rate with strings are high and they require resizing all the time.

On the other hand, tries are static, no need to resize them.

Comment hidden because of low score. Click to expand.
0

Sorry, iPad autocorrection did not help :)

Comment hidden because of low score. Click to expand.
0

Actually the interviewer is not right.

It doesn't matter if you do one or two passes over the data; they're both constants, so the interviewer is NOT correct.

Both O(2N) and O(N) are O(N).

The solution the OP provided is asymptotically as good as the so called 'optimal' solution by the interviewer.

Comment hidden because of low score. Click to expand.
0

Errrr... There is time complexity and space/memory complexity. So you are right that there are quite a few O(N) algorithms (time!) but we should consider memory usage as well.

Without knowing the overall number/distinct number/distribution/etc of the words it is hard to tell more.

The funny side of it is that most probably the answer is always "a", "the", "in" for books written in English (or something like that) :-) and if I am not mistaken very similar list-of-3-most-common-words can be compiled for all languages.

Comment hidden because of low score. Click to expand.
0

You are right absolutely right.
That is why I asked the OP optimal in which sense.

I personally admit I would have answered in a similar matter. I am certainly not an expert, but I think as the file size increases, a map-reduce approach could be a sweet solution; this seems tailored for some kind of parallel processing.

Comment hidden because of low score. Click to expand.
0
of 2 vote

For god's sake stop saying "max" heap. How are you planning to find out which node to remove when something bigger than the root comes? You are going to traverse the whole heap to find the minimum?

Comment hidden because of low score. Click to expand.
0

What?
I think they mean
=> Get word counts, given a dictionary.
=> Max Heapify
=> Delete 3.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Hello,
In such question is code not required coz no one mentioned a perfect code till now?? DO they just ask for the idea and the algo.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Hello,
In such question is code not required coz no one mentioned a perfect code till now?? DO they just ask for the idea and the algo.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Hello,
In such question is code not required coz no one mentioned a perfect code till now?? DO they just ask for the idea and the algo.

Comment hidden because of low score. Click to expand.
0
of 0 vote

hash up the text and the apply selection sort on it for kth largest value

Comment hidden because of low score. Click to expand.
0
of 0 vote

Here is the code i written using prefix tree. This program will tell you first n frequent used words in a file .. I have not read from file for simplicity i just read from an array . If you want to read from file ..read file line by line.. convert each line into words of array and trying adding each word using this program.

``````package example.concepts;

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class TrieTreeExample {

public static void main(String[] args) {

int n= 3 // No. of most occured word
TriTree tree = new TriTree(n);
String[] words = {"prashant","pras","pras","pras","prashant","pras","abc","test","test","prashant","abc","hhh","bbb","bbb","pras","bbb","prashant","prashant","ab","ab","ab","pras"};
for (int i = 0; i < words.length; i++)
tree.insertWord(words[i]);

//   tree.printTree(0, new Character[50], tree.root);
tree.printTree(0, new StringBuffer(), tree.root);

System.out.println("Is exist :" + tree.findWord("pras"));
System.out.println("Max occured word is : " + tree.maxWord1 + " Count is :" + tree.maxWordCount1);
System.out.println("Max occured word is : " + tree.maxWord2 + " Count is :" + tree.maxWordCount2);
System.out.println("Max occured word is : " + tree.maxWord3 + " Count is :" + tree.maxWordCount3);
System.out.println("Max Word :" + Arrays.asList(tree.maxWord));
System.out.print("Max Count : [ ");
for(int i =0; i < tree.maxCount.length;i++)
{
System.out.print(tree.maxCount[i] + " ");
}
System.out.println(" ]");
System.out.println(tree.maxMap);
}

}

class TriTree
{
TriNode root;
int maxCount[];
String maxWord[];

TriTree(int n)
{
root= new TriNode();
maxCount = new int[n];
maxWord = new String[n];
}

void insertWord(String word)
{
if (root==null)
{
System.out.println("Tree does not exist...");
}
boolean isExist=true;
TriNode current= root;
Character character;
for(int i=0; i < word.length();i++)
{
character = word.charAt(i);
{
isExist=false;
}
}
if(!current.fullWord)
{
isExist=false;
current.fullWord=true;
}
else if (isExist && current.fullWord)
{
current.countOfChar++;
}

findMaxWord(current, word);

}

void findMaxWord(TriNode current, String word)
{

for (int i=0; i < maxCount.length;i++)
{

if (maxCount[i] < current.countOfChar )
{
if(!word.equalsIgnoreCase(maxWord[i]) && i < maxCount.length-1)
{
maxCount[i+1]= maxCount[i];
maxWord[i+1] = maxWord[i];
}
maxCount[i]= current.countOfChar;
maxWord[i]=word;
break;
}

}

}
boolean findWord(String word)
{
if (root==null)
{
System.out.println("Tree does not exist...");
}
TriNode current= root;
Character character;
int i;
for(i=0; i < word.length();i++)
{
character = word.charAt(i);
{
return 	false;
}
}

if (i == word.length() && current==null)
{
return false;
}
else if (current!=null && !current.fullWord)
{
return false;
}
else
{
System.out.println("count of word " + current.value + " is :" + current.countOfChar);
}

return true;
}
void traverseTree()
{

}
void printTree(int level, Character branch[], TriNode root)
{

if (root==null)
{
return;
}
for (Character key: setKeys)
{
branch[level]= root.value;
}
if (root.fullWord)
{
branch[level]=root.value;
for (int i=1; i <=level;i++)
{
System.out.print(branch[i]);
}
System.out.println();
}

}
void printTree(int level, StringBuffer buf, TriNode root)
{

if (root==null)
{
return;
}
for (Character key: setKeys)
{
buf.insert(level, root.value);
}
if (root.fullWord)
{
buf.insert(level, root.value);
System.out.print("Word :");
for (int i=1; i <=level;i++)
{
System.out.print(buf.charAt(i));
}
System.out.print("  Count : " + root.countOfChar);
System.out.println();
}

}

}
class TriNode
{
Character value;
boolean fullWord;
int countOfChar;
TriNode()
{
value ='\0';
fullWord=false;
countOfChar=0;
}
TriNode(Character value, boolean isFullWord, int countOfChar)
{
this.value = value;
fullWord=isFullWord;
this.countOfChar=countOfChar;
}

}``````

Comment hidden because of low score. Click to expand.
-1
of 1 vote

Use a max heap

Comment hidden because of low score. Click to expand.
-1
of 1 vote

One possible solution:
Sort the file and which is O(N*log(N)) and now with a single pass we can find out the top 3 words

Comment hidden because of low score. Click to expand.
0

correct dude. The best and optimal solution.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.