Amazon Interview Question
SDE-2sCountry: India
Interview Type: In-Person
For part1 of the problem, use a min-heap of size k. After a number is read from the stream, if the size of the heap is less than k, insert it into the heap. Otherwise, if the given number is greater than the min-value of the heap, evict the min value and insert the given number into the heap. At the end, you will have top k elements left in the heap.
For part2, let L = m/k(m<k, per the question). Here again use a min-heap, but of size k/(L+1). In the first pass, find out the top k/(L+1) elements. In the second pass, find out the next top k/(L+1), by inserting into the heap only the elements which are less than the first top k/(L+1). Like this, make L+1 passes over the data, each time capturing a window of k/(L+1) ordered elements. In the end, we'll be left with top k elements.
Great Solution.
However, are you sure about the value of L being m/k. I think it should be k/m.
To start with we can have the min-heap of any size as long as it fits in the memory m. Therefore, let us assume, the size of heap gets reduced by a factor d i.e. size of the heap is k/d.
Therefore, k/d <= m
i.e. k/m <= d
i.e. d can be k/m + 1
If L = k/m + 1, then d = L + 1.
When a "stream of integers" is mentioned, we don't have the numbers saved in something like an array and we should process them one by one as they arrive.
Hence, your part2 answer would actually require O(n) memory. It sounds to me one of those questions where we can't get 100% correctness given those constraints and we have to be creative and talkative with the interviewer.
For every m numbers from the stream, create a sorted list and write it to a file. Once you have the stream completed you would have N files each with m sorted numbers in them. This is when you would do an external merge sort on all of the files and have a final sorted list. Take the first K numbers.
How would you do external merge sort with 'm' memory < N * m . What is the overall complexity of your solution to query the top K numbers at any time ?
For part 1 of the problem, use min-heap of size k.
- cy August 14, 2013For part 2 of the problem,
1. get the first k number from the stream, and the smallest number of these k number, say mink
2. store the first k number into temp file
3. continue scan the stream, put the number larger than mink into memory buffer
4. if memory buffer is full (size larger than m), merge the buffer into temp file
1. only keep the top k larger number
1. get the new mink (k-th large number)
5. empty the memory buffer, go back to step 3 until the stream end