find 10 most frequently occurr

Adobe Interview Question for Software Engineer / Developers

0

of 0 votes

14
Answers
find 10 most frequently occurring words in a given file.
- Nascent December 26, 2012 in India | Report Duplicate | Flag | PURGE
Adobe Software Engineer / Developer

Email me when people comment.

An error occurred in subscribing you.

Country: India

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

Use a trie to count the occurances of the words then traverse the tree to get the frequency values at the leaves. Insert the word/freq count objects into a min heap and pop 10.

- kunikos January 01, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

// i am assuming file contais alphabets only small case letter we can do also for all symbols
//by nesting logic

int a[26];
for(i=0;i<26;i++)
a[i]=0;
FILE *fp=fopen("abc.txt",r)
while((ch=fgetc(fp))!=null)
a[ch-'a']++;

//sort the array in decreasing order

sort(a);
//print first 10 entries from the array

plz crct me if i m wrng:)

- Mukul garg January 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

you have to find out top ten most frequant Words in a file not letter, sir

- Anonymous March 25, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Your code finds the frequentest letters, rather than words.

- lweipku April 03, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Suppose your i/p is :- my name is dhaker i am from warangal my branch is computer science and enggineering i want placement in adobe.
for all count[]=0;
char *ch=strtok(str," ");
hash(ch);
while(ch)
{ ch=strtok(ch," ");
hash(ch);
}
for(int i=0;i<10;i++)
{
for(int j=0;j<n-i;i++)
{ if(count[i]>count[j])
swap(count[i],count[j]);
}
cout<<count[9-i]<<" ";
}

void hash(char*c)
{ for(int i=0; ;i++)
{ if(strcmp(c,s[i])==0) count[i]++;
else
{ strcpy(s[i],c); count[i]++;}
}
}

- Mukesh Kumar Dhaker March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Your code will not compile. It will give segmentation fault.

- Priyanka April 29, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use a clever hash function and get the frequency of all the words.now create a min heap of 10 words based on there frequency.....similar approach as required to get 10 largest elements of an array

- Anonymous September 04, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If the file's size is enough to fit in memory than the solution will be straight forward ... Use one Minheap<Integer> of size 10 and a HashMap <String, count> {{{ heap = new minheap(); for each item // if you don't already have k items on the heap, add this one if (heap.count < k) heap.Add(item) else if (item.frequency > heap.Peek().frequency) { // The new item's frequency is greater than the lowest frequency // already on the heap. Remove the item from the heap // and add the new item. heap.RemoveRoot(); heap.Add(item); } }} But if it a big file, a book etc. Than it is tricky to find top 10 words. I will use SPACE SAVING algorithm . {{{ SPACESAVING(k) n ← 0; T ← ∅; foreach i do n ← n + 1; if i ∈ T then //If word alreadyb exists increase its count ci ← ci + 1; else if |T| < k then //if T is less than F add new word T ← T ∪ {i}; ci ← 1; //New words count will be 1 else j ← arg minj∈T cj; //If T is full remove least frequent one, and add new one ci ← cj + 1; // and add last removed words count in new word T ← T ∪ {i}\{j}; }}} Here is link of original research paper: dimacs.rutgers.edu/~graham/pubs/papers/freqcacm.pdf - Ajeet November 07, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If the file's size is enough to fit in memory than the solution will be straight forward ...
Use one Minheap<Integer> of size 10 and a HashMap <String, count>

heap = new minheap();
for each item
    // if you don't already have k items on the heap, add this one
    if (heap.count < k)
        heap.Add(item)
    else if (item.frequency > heap.Peek().frequency)
    {
        // The new item's frequency is greater than the lowest frequency
        // already on the heap. Remove the item from the heap
        // and add the new item.
        heap.RemoveRoot();
        heap.Add(item);
    }

But if it a big file, a book etc. Than it is tricky to find top 10 words. I will use SPACE SAVING algorithm . 

SPACESAVING(k)
n ← 0;
T ← ∅;
foreach i do
	n ← n + 1;
	if i ∈ T then //If word alreadyb exists increase its count
		ci ← ci + 1;
	else if |T| < k then //if T is less than F add new word
		T ← T ∪ {i};
		ci ← 1; //New words count will be 1
	else
		j ← arg minj∈T cj; //If T is full remove least frequent one, and add new one
		ci ← cj + 1;  // and add last removed words count in new word
		T ← T ∪ {i}\{j};
Here is link of original research paper:
dimacs.rutgers.edu/~graham/pubs/papers/freqcacm.pdf

- Ajeet November 07, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Method 1: Using HashTable. (Not so efficient because of limited memory constrains)

Method 2: Using Trie and Heaps. (Memory Efficient)

Source : geeksforgeeks

- codeAddict November 12, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

1. Put in a hash table with <key,value> = <words, freq>
2. Iterate through the elements of the hash table to find the 10 most frequent

Requires O(#no of distinct words) space and O(#no of total words) time complexity

- BoredCoder December 26, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

The answer seems to be trivial. If the file size reaches megabytes then the program will become slow since there need to be generated thousands of hash. There might be a better solution which i am thinking of.

- bala December 27, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

For generating the top-10 from hast table elements follow the below method:
1. Initialize a min-heap with first ten elements of the frequency field from the hash table.
2. As you insert each element in the hash table compare the min element of heap O(1) operation to the current updated freq element inserted to the hash table.
3. If larger, then extract the min element from the heap and insert the current element with a O(log k) heapify operation.
4. If lesser, then no heap operation required.
The top-K elements can be processed synchronously with each hash table operation and can be kept updated real-time with a max of O(log K) time complexity where k=10 in this case.

- BoredCoder December 27, 2012 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

this problem becomes difficult when the file is really big...

one solution could be, you can partition the file into multiple segments and calculate frequent word lists (using hash tables) for each segments and merge them

this way, the space complexity is minimized. If you split the file into n segments, and store the top K frequently used words for each segment, you are looking at n*k space which will be less than storing all the words of the file in a hashtable

one obvious danger is there could be words that won't make it to any of the sub frequently used words and might therefore be not part of the final frequent words list.
But it it is possible that those few words ignored at each segment could potentially make the ideal K frequently used words list if not for this solution

- Nava Davuluri September 10, 2013 | Flag Reply

CareerCup

Adobe Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews