Design a data structure to kee

Amazon Interview Question for SDE1s

1

of 1 vote

18
Answers
Design a data structure to keep track of top k elements out of 2 billion records.
Each record is numberd with a key which is 30 bit and a number which is count of how many times the customer has visited us.
Come up with an data structure so that the updation of element in 2 billion records will be faster.
Getting top k element will be faster
- panda April 22, 2015 in India | Report Duplicate | Flag | PURGE
Amazon SDE1 Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

This can be solved using Min Heap data structure.

- Keep a min heap of size K.
- Add first K of the 2 billion records in the heap. O(K log K)
- Iterate through rest of the records and for each record
- If the record is greater than the heap top element (which is the smallest of all the heap elements) , pop the top element out of the min heap and insert the new element. O (log K)
- At the end the K elements remain in the heap are the top K elements.

Time Complexity: O (K log K) + N * O(log K) = O (K log K) + O (log K) as N is constant here.
Space Complexity: O(K)

- swapsmagic April 22, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

N is too big constant here!

- ninhnnsoc April 22, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Here we have to consider 2 things.
- N is too big number
- Interview is more interested on quick updating of values.

In minHeap, how do you update the existing node somewhere in the middle of tree?

- bharat April 23, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yes N is big, but we only go through N elements once as part of computing the top k of the existing 2 billion records (which afaik are not sorted or anything that allows you to prune the search space). This is the best you can do, since you must look at all records to determine top k.

After this initial set up, updating and getting the top k becomes O(lgk) operations.

- JW April 26, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Build heap takes O(k) not O (k logk)

- Jaylo May 14, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Add first K of the 2 billion records in the heap will take O(k) time.
Creating heap takes O(k) time
So time complexity would be : O (K) + (N-K) * O(log K)

Space complexity here is not O(K).
I was asked where will I keep ther 2 billion records ?
Such that when data is updated , We need to update the top k elements.

- panda April 22, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How about a hashmap of objects containing the key of the record as key and address of the record object as value. That way updating a record would be O(1)?
What say?

- Anonymous April 22, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How about a hashmap of objects containing the key of the record as key and address of the record object as value. That way updating a record would be O(1)?
What say?

- Coder April 22, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How about using multiple levels of min heap ?

We can split the 2 billion data into an number of chunks say M and associate a min heap with each of the chunks. So updating a particular value will not affect the entire 2million data but only the corresponding chunk(This can we achieved by a hashtable , ie , given a record u can easily identify the chunk which it belongs to ) and its min heap.
Finally to get the top K elements , create a top level min heap by iterating through the M min heaps .

Complexity as i feel
to update : log K * (M/N) ==> To create a min Heap with K elements from (M/N) records. Assume that the record can be updated in O(1) time

to get the top K elements : log K * (M * K ) ==> To crate a min heap of K elements from the other M min heaps of K elements
space : (M + 1) * K ===>for min heaps
(M) ==> for hashtable

Please correct me if i am wrong.

- sarath April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- sarath April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Anonymous April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Anonymous April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Anonymous April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Anonymous April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- sarath April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- sarath April 24, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Anonymous May 31, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use a HashMap to map key with Count object.
class Count{
int key;
int counter;
}
HashMap<Integer,Count> map;
Use a MinHeap of size k, to store top k counts.
PriorityQueue<Count> minHeap;
When the count increases for any key, update its Count object. Check if the count for any key which is not present in Heap is greater than the minimum count in Heap, update the Heap replacing the count object.
Time Complexity-nlogk
Space Complexity-n+k

- Iram22 March 09, 2017 | Flag Reply

CareerCup

Amazon Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews