Write an algorithm to find the

Expedia Interview Question for Software Engineer / Developers

0

of 0 votes

8
Answers
Write an algorithm to find the top 10 frequently occurring words in a book.
- sathish.leo May 26, 2012 in United States | Report Duplicate | Flag | PURGE
Expedia Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

Construct a "trie" tree with an extra field for keeping a count during first parse of the book. Once done, output the top 10 frequently occurring words. May be there are some other optimal algos, but this is what struck me immediately. Assuming there are N words in the book, and out of which k are distinct. Time complexity to create a "trie" tree would be O(N) and searching top 10 frequently occurring words would be O(k).

- Prakash May 26, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

something better than tries like ternary trees can be used. also, since we want to know top 10, a list of the top 10 words can be made real time to eliminate any extra time for searching top 10.

- ani May 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It will be impossible to do this in real time. This is because the sorting-property(freq) is not known at head time.

- Yoda June 12, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

As we are finding the word count from a predefined language (e.g. English), we can use the predefined dictionary of language words and their digests (hash values). Now, we will place the words from the book into each bucket associated with a word digest in constant time (calculation of digest and placing into appropriate bucket can be done in constant time). Once we are done with parsing the book, will end up with word counts and these can be sorted in O(lg n) time (to the base 2).

- Nagendar May 26, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Which sorting algorithm gives O(lgn) ? The best is know is O(n)

- Anonymous May 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Tree Map data structure can be used. Sort and store. Only top10 will be stored according to condition specified.

- Priyanka March 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

step 1 : If the file(the book) is very large and can't be sorted in memory you can split it into chunks that can be sorted in memory.

step 2 : For each sorted chunk compute sorted pairs of (words, nr_occurrence), at his point you can renounce to the chunks because you need only the sorted pairs.

step 3 : Iterate over the chunks and sort the chunks and always keep the top ten appearances.

Example:

Step 1:

a b a ab abb a a b b c c ab ab

split into :

chunk 1: a b a ab
chunk 2: abb a a b b
chunk 3: c c ab ab

Step 2:

chunk 1: a2, b1, ab1 chunk 2: a2, b2, abb1
chunk 3: c2, ab2

Step 3(merge the chunks and keep the top ten appearances):

a4 b3 ab3 c2 abb1

- RalucaElena1985 September 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Add words to a hashmap with count as the value. Insert the words into max-heap, so that you directly pop the 10 max elements. Running time = Insertion of n elements into hashtable + Insertion of n elements into max-heap + selecting top 10 from max-heap = O(n)+O(nlogn) +O(10) = O(nlogn)

- Anonymous February 23, 2014 | Flag Reply

CareerCup

Expedia Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews