Microsoft Interview Question
Software Engineer / DevelopersTeam: Bing
Country: United States
Interview Type: Phone Interview
step:1 fetch one word at a time and store it in a Trie with a variable for that word which would keep track of the no of occurances of that word.Each time that word is inserted in the Trie increment that value by one, so at the end it would store the no of occurrence of that word in the whole text.
step:2 Iterate over the trie and return the words which have 'n' no of occurences.
Total complexity: O(sizeof text)
@raja
You are right but using a trie requires a huge space complexity.
Is there any solution without using trie?
You can use a hashtable instead of a trie. This can save a considerable amount of space. The key would be the word and the value would be the frequency
start parse input data file and store words in STL map<word,frequency>; as soon as frequency incremented to n+1 delete that map entry...continue this to the end of file; In last delete all map entries having frequency <n. Thus, finally map is containing all words in file with exaclty n occurences.
let F be the original file
read F and create a new file, say N, such that each word in F will make only exactly one line in N - takes O(n)
sort N using external merge sort - takes O(nlogn)
read N and list the words whose duplicate counts match n - takes O(n)
So the running time complexity is O(n) + O(nlogn) + O(n) = O(n)
One way that i think is much faster is to
1: copy the content of the file in memory (or another file). (We will call this 'data')
2: Get the first word and search it in the data.
3: Count how many times the word occurred and rewrite the memory where each occurrence occurred with white spaces.
4: Do same for the next word.
Rewriting the momory will ensure that we will not scan the same word twice. Also that if we user Boyer-Moore for string search then these overwritten whitespace chunks will be skipped in long jumps( so you will get a nice performance boost after first few words)
Create a linked list, for each node you need to have a word and a frequency.
For each new word you read from the file, do the following:
Compare the word with stored words in the list.
If the word is already there, increase the frequency.
If the word is not there, add the new element to the list with the new word and frequency = 1.
Everytime searching one word in linked list will cost more. So hash table is good in this case.
- vijay January 14, 2012