Design question to find the mo

Amazon Interview Question for Software Engineer / Developers

-2

of 2 votes

11
Answers
Design question to find the most frequent sequence of web page views from a log file of all the web pages viewed. Say the sequence is 4.
- Guy January 16, 2014 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

The way I see it, a sequence could be of a length=1 which means any individual page in the log file makes a sequence. Now the algorithm will be just to construct a HashTable<string, int> and populate it with the pages and the number of occurrences for that page. Then we generalize it and "design" the question - "What is the most frequent sequence of length N" Say the sequence is 4 would mean that we ask for the most frequent sequence P1->P2->P3->P4. Then the algorithm should be just tuned to take not one page but N and store those as concatenated string as the key in the HashTable. Like P1#P2#P3#P4. The iterating over the log would be with a window of N elements instead of one at a time.

- napoapo January 17, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Could you please explain more what exactly we need to do when we walk through the log file?

- Guy January 17, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

assume each page takes 1 sequence, and first we get all sequences of length 4. It is like (p1, p2, p3, p4), (p2,p3, p4, p5) etc, so it takes O(n) time.
then we encode this sequence to a hashmap as the keys, and the frequency is value. We increase the value if the key has exited. it takes O(n)
then just find the key with highest value. O(n)

Overall O(n)

- tango January 17, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Could you please explain more what exactly we need to do when we walk through the log file? It sounds like very similar to what I thought?

- Guy January 17, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

we just go through the log file and take the first 4 pages(p1,p2,p3,p4) as the first key, and then (p2,p3,p4,p5) as the second key, all the way to the end. It takes O(n)

- tango January 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

We could use Trie Data Strucutre to store the sequence of size k

- RG January 17, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

A Max-Heap data structure could be a nice solution to this problem. Each node represents a page and the key is its views count. Building a heap from n entries requires O(n log n) and insertion/heapify is ofcourse O(log n) .. Finding the most k frequent pages is O(k).

- ahmad.m.bakr January 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

You missed the part where we need to find the max sequence, not a single page.

- Guy January 20, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

assume pages are represented by page id’s. now keep looking for that sum, and use that sum as key for hashtable and value is count of how many time that sum was found

- krutik January 29, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I asked him the same question, and he said it wouldn't matter because if the log file is in format for userName -> page, then the question would become find the most frequent sequence of pages among all users. So it would not make a big difference. The data structure and logic should be roughly the same. I am thinking we could use a hashtable to store all unique sequence by concatenating the page string by a special character like #. If the sequence already exists in hashtable, we just increment the value by 1 where value is the count of the associated sequence. But this might waste a lot of space since the hashtable stores all unique sequence.

- Guy January 16, 2014 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews