Amazon Interview Question
Country: India
HashTable contains keys and values
keys --> String/word
values-->reference to an wordCounts
class WordCounts
{
public int count=0;
public void increment()
{
count++;
}
}
Max-heap should be built based on count in wordCounts
I have another solution.
HashTable + ArrayList.
I can not provide explanation now. busy... Sorry
@Vincent
i think hash-table & min-heap would be helpful here.
using max-heap would lead to more operations in terms of maintaining the heap whenever a new word comes.
Using min-heap we can directly swap the new word if it's count is > root (min-heap) and call min-heapify(root) to adjust the heap. If a word comes whose count is < root then we ignore it since we want the maximum frequency words of size equal to that of the min-heap that we are maintaining.
Thoughts?
@vincent hash table will contain reference to max-heap . but from max - heap we can not access word so we have to scan hash table to find out top 10 words which is 0(n). But this can be done using array only. In 2 D array we can store word and count and get top 10 words in one scan. No need of this complicated DS.
Good point, I should not store the number of occurences in the heap, but the string itself.
It should be HashMap + Min Heap, as the decision to include new element in Heap or not would be taken by checking its frequency with the element having least frequency, and currently present in minHeap.
We need a size-bounded minheap. Because everytime we encounter a newword, we have to evict the entry with the minimum count. If the number of entry (in this case 10) grows, then hashtable can be used. Otherwise, hashtable is of no use. Here is the code:
import java.util.ArrayList;
public class Mostfrequentwords {
/*There is a big file of words which is dynamically changing.
We are continuously adding some words into it.
How would you keep track of top 10 trending words at each moment ?
*/
private ArrayList<WordEntry> heap;
private int maxsize=10;
public Mostfrequentwords(){
heap=new ArrayList<WordEntry>();
}
public void setMaxSize(int maxsize){
this.maxsize=maxsize;
}
private int Heap_minimum(){
return heap.get(0).getCount();
}
private int Heap_extract_min(){
int min=heap.get(0).getCount();
heap.set(0, heap.get(heap.size()));
heap.remove(heap.size());
min_Heapify(0);
return min;
}
private void min_Heapify(int i) {
// TODO Auto-generated method stub
int l=2*i;
int r=2*i+1;
int smallest=i;
if(l<heap.size() && heap.get(l).getCount()<heap.get(i).getCount()){
smallest=l;
}
else{
smallest=i;
}
if(r<heap.size() && heap.get(r).getCount()<heap.get(i).getCount()){
smallest=r;
}
else{
smallest=i;
}
if(smallest!=i){
swap(i,smallest);
min_Heapify(smallest);
}
}
private void swap(int x,int y){
WordEntry temp=heap.get(x);
heap.set(x,heap.get(y));
heap.set(y,temp);
}
private int getWordIndex(String word){
for(int i=0;i<heap.size();i++){
if(heap.get(i).getWord().equals(word)){
return i;
}
}
return -1;
}
private void min_Heap_Insert(String newword){
//check if the word exist
int newWordIndex=getWordIndex(newword);
WordEntry wordentry;
//if does not exist create a new one
if(newWordIndex==-1){
wordentry=new WordEntry(newword);
if(heap.size()==maxsize){
//we have to find the maximum one and evict it
if(wordentry.getCount()>=heap.get(0).getCount()){
heap.set(0,wordentry);
}
else{
//do not do anything
}
}
//there is place for addition of word
else{
heap.add(wordentry);
}
int i=heap.size()-1;
while(i>0 && heap.get(i/2).getCount()>heap.get(i).getCount()){
swap(i/2,i);
i=i/2;
}
}
//else get the index of the word
else{
wordentry=heap.get(newWordIndex);
wordentry.increasecount();
min_Heapify(newWordIndex);
}
}
public void addWord(String word){
min_Heap_Insert(word);
}
public void outputHeap(){
System.out.println("Heap entries:");
for(int i=0;i<heap.size();i++){
System.out.println("Word:"+heap.get(i).getWord()+":Count="+heap.get(i).getCount());
}
}
private class WordEntry{
private int count;
private String word;
public WordEntry(String x){
word=x;
count=1;
}
public void increasecount(){
count++;
}
public String getWord(){
return word;
}
public int getCount(){
return count;
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
String []words={"Test1","Test2","Test3","Test4","Test1","Test2","Test5","Test6","Test5"};
Mostfrequentwords mostfrequentwords=new Mostfrequentwords();
mostfrequentwords.setMaxSize(3);
for(int i=0;i<words.length;i++){
mostfrequentwords.addWord(words[i]);
mostfrequentwords.outputHeap();
}
}
}
Hi, Could anyone just tell me if the hashtable mentioned here actually has pointers to the node in the heap. i.e the wordcound and the words are stored in the max heap(tree). And you use hashing to get the pointer to the corrsponding node in the heap. Please let me know if this is wrong.
HashTable + Max-Heap
- Vincent August 24, 2012