## Amazon Interview Question for Software Engineer / Developers

Country: United States
Interview Type: In-Person

Comment hidden because of low score. Click to expand.
3
of 3 vote

The way I see it, a sequence could be of a length=1 which means any individual page in the log file makes a sequence. Now the algorithm will be just to construct a HashTable<string, int> and populate it with the pages and the number of occurrences for that page. Then we generalize it and "design" the question - "What is the most frequent sequence of length N" Say the sequence is 4 would mean that we ask for the most frequent sequence P1->P2->P3->P4. Then the algorithm should be just tuned to take not one page but N and store those as concatenated string as the key in the HashTable. Like P1#P2#P3#P4. The iterating over the log would be with a window of N elements instead of one at a time.

Comment hidden because of low score. Click to expand.
0

Could you please explain more what exactly we need to do when we walk through the log file?

Comment hidden because of low score. Click to expand.
0
of 0 vote

assume each page takes 1 sequence, and first we get all sequences of length 4. It is like (p1, p2, p3, p4), (p2,p3, p4, p5) etc, so it takes O(n) time.
then we encode this sequence to a hashmap as the keys, and the frequency is value. We increase the value if the key has exited. it takes O(n)
then just find the key with highest value. O(n)

Overall O(n)

Comment hidden because of low score. Click to expand.
0

Could you please explain more what exactly we need to do when we walk through the log file? It sounds like very similar to what I thought?

Comment hidden because of low score. Click to expand.
0

we just go through the log file and take the first 4 pages(p1,p2,p3,p4) as the first key, and then (p2,p3,p4,p5) as the second key, all the way to the end. It takes O(n)

Comment hidden because of low score. Click to expand.
0
of 0 vote

We could use Trie Data Strucutre to store the sequence of size k

Comment hidden because of low score. Click to expand.
0
of 0 vote

A Max-Heap data structure could be a nice solution to this problem. Each node represents a page and the key is its views count. Building a heap from n entries requires O(n log n) and insertion/heapify is ofcourse O(log n) .. Finding the most k frequent pages is O(k).

Comment hidden because of low score. Click to expand.
0

You missed the part where we need to find the max sequence, not a single page.

Comment hidden because of low score. Click to expand.
0
of 0 vote

assume pages are represented by page idâ€™s. now keep looking for that sum, and use that sum as key for hashtable and value is count of how many time that sum was found

Comment hidden because of low score. Click to expand.
Comment hidden because of low score. Click to expand.
0

I asked him the same question, and he said it wouldn't matter because if the log file is in format for userName -> page, then the question would become find the most frequent sequence of pages among all users. So it would not make a big difference. The data structure and logic should be roughly the same. I am thinking we could use a hashtable to store all unique sequence by concatenating the page string by a special character like #. If the sequence already exists in hashtable, we just increment the value by 1 where value is the count of the associated sequence. But this might waste a lot of space since the hashtable stores all unique sequence.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.