## Amazon Interview Question

Software Engineer / Developers**Country:**United States

**Interview Type:**In-Person

assume each page takes 1 sequence, and first we get all sequences of length 4. It is like (p1, p2, p3, p4), (p2,p3, p4, p5) etc, so it takes O(n) time.

then we encode this sequence to a hashmap as the keys, and the frequency is value. We increase the value if the key has exited. it takes O(n)

then just find the key with highest value. O(n)

Overall O(n)

Could you please explain more what exactly we need to do when we walk through the log file? It sounds like very similar to what I thought?

A Max-Heap data structure could be a nice solution to this problem. Each node represents a page and the key is its views count. Building a heap from n entries requires O(n log n) and insertion/heapify is ofcourse O(log n) .. Finding the most k frequent pages is O(k).

I asked him the same question, and he said it wouldn't matter because if the log file is in format for userName -> page, then the question would become find the most frequent sequence of pages among all users. So it would not make a big difference. The data structure and logic should be roughly the same. I am thinking we could use a hashtable to store all unique sequence by concatenating the page string by a special character like #. If the sequence already exists in hashtable, we just increment the value by 1 where value is the count of the associated sequence. But this might waste a lot of space since the hashtable stores all unique sequence.

The way I see it, a sequence could be of a length=1 which means any individual page in the log file makes a sequence. Now the algorithm will be just to construct a HashTable<string, int> and populate it with the pages and the number of occurrences for that page. Then we generalize it and "design" the question - "What is the most frequent sequence of length N" Say the sequence is 4 would mean that we ask for the most frequent sequence P1->P2->P3->P4. Then the algorithm should be just tuned to take not one page but N and store those as concatenated string as the key in the HashTable. Like P1#P2#P3#P4. The iterating over the log would be with a window of N elements instead of one at a time.

- napoapo January 17, 2014