Amazon Interview Question
SDE1sCountry: India
Interview Type: Phone Interview
Here we have to consider 2 things.
- N is too big number
- Interview is more interested on quick updating of values.
In minHeap, how do you update the existing node somewhere in the middle of tree?
Yes N is big, but we only go through N elements once as part of computing the top k of the existing 2 billion records (which afaik are not sorted or anything that allows you to prune the search space). This is the best you can do, since you must look at all records to determine top k.
After this initial set up, updating and getting the top k becomes O(lgk) operations.
Add first K of the 2 billion records in the heap will take O(k) time.
Creating heap takes O(k) time
So time complexity would be : O (K) + (N-K) * O(log K)
Space complexity here is not O(K).
I was asked where will I keep ther 2 billion records ?
Such that when data is updated , We need to update the top k elements.
How about a hashmap of objects containing the key of the record as key and address of the record object as value. That way updating a record would be O(1)?
What say?
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
How about using multiple levels of min heap ?
We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .
Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time
to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable
Please correct me if i am wrong.
Use a HashMap to map key with Count object.
class Count{
int key;
int counter;
}
HashMap<Integer,Count> map;
Use a MinHeap of size k, to store top k counts.
PriorityQueue<Count> minHeap;
When the count increases for any key, update its Count object. Check if the count for any key which is not present in Heap is greater than the minimum count in Heap, update the Heap replacing the count object.
Time Complexity-nlogk
Space Complexity-n+k
This can be solved using Min Heap data structure.
- swapsmagic April 22, 2015- Keep a min heap of size K.
- Add first K of the 2 billion records in the heap. O(K log K)
- Iterate through rest of the records and for each record
- If the record is greater than the heap top element (which is the smallest of all the heap elements) , pop the top element out of the min heap and insert the new element. O (log K)
- At the end the K elements remain in the heap are the top K elements.
Time Complexity: O (K log K) + N * O(log K) = O (K log K) + O (log K) as N is constant here.
Space Complexity: O(K)