Amazon Interview Question
Software Engineer in Testsonce you have all the frequencies in the map, since the map is unordered, we need to sort the map based on values. Use collections.sort passing the entry set and comparator as arguments to it. comparator's compare to function can sort the entries based on the values(frequencies).. Thus your map is sorted and you can print the values of top K elements from the map.
Map<String,Integer> sortByValue(Map<string,Integer> m) {
List<Map.Entry<String,Integer>> list = new LinkedList<Map.Entry<String,Integer>>(map.entryset());
Collections.sort(list, new Comparator<Map.Entry<String,Integer>>(){ public int compare(Map.Entry<String,Integer> m1, Map.Entry<String,Integer> m2){
return (m2.getValue()).CompareTo(m1.getValue());
}
});
Map<String,Integer> result = new LinkedHashMap<String,Integer>();
for(Map.Entry<String,Integer> entry : list) {
result.put(entry.getKey(),entry.getValue());
}
}
In a very large file find the 3 most frequent words
Sort the words in the file, now just walk through the list. This is O(nlogn). Better algorithm will be to hash each word, and then walk through the hash and report three words with max occurences.
Also tell the number of occurrences of each word
Above idea will suffice.
Well it's a "very large" file. The idea is that there isn't as much memory. So holding the complete data structure in memory is not possible.
Just like in External merge sort, you sort chunks of file and write all the sorted chunks to a temporary file. Now in the memory, create k chunks(k = number of chunks in file), although the chunk size in memory is not as big as it is in the file. This is fine since we'd load the remaining data in the memory chunk from the file chunk as we reach the end of memory chunk. Create a min-heap using the top elements (min elements) of each memory chunk. Now just keep Extracting the min element and filling in the heap. Since you are getting all the elements in sorted order, it's easy to keep track of 3 most used keys so far.
Also tell the number of occurrences of each word
- neo April 12, 2010