Ebay Interview Question
Software Engineer / DevelopersCountry: United States
this is wrong. if data is less than the top of the heap then we cannot ignore it. we have to see if it greater than the minimum of the 100 that we already have. if no then we can ignore. so we have to maintain one max heap and one min heap. both of size 100. the min head will have the smallest element in the root whereas the max heap has largest element at the root. when a new number comes in we check if it is grater than the root of the max heap. if yes then we remove the root of the max heap and add this number. if the new number is less than the root of the max heap then we check if it grater than the root of the min heap. if yes then we delete the root of the min heap and add this new number to the min heap and adjust it. if the new number is less than the root of the max heap and less than the root of the min heap then we ignore it.
Take a max heap of size 100. Keep inserting elements into the heap.
Initially, the first 100 elements would all go into the heap. After that, the maximum of those 100 elements would be at the top of the heap. This is the current top 100.
So, if the 101th element is less than the top, it means it is part of the top 100 and the element which is currently at the top must be removed. Add this 101th element to the heap.
If the 101th element is greater than the top, it means it is greater than all of the 100 elements which are currently in the heap. So, don't add it to the heap.
Keep doing this by loading as much of the elements you can in your memory. This can be easily calculated by the total amount of memory - memory to store a heap of size 100.
I think this should convince the interviewer.
To largest mean the largest 100 elements. You have to use min heap and not max heap. Max heap should be used if you have to find the smallest 100 items. Try out with a small set of numbers (10), where you have to find the largest 4. If your numbers are from 1-10, the min heap at the end will contain numbers 7,8,9,10.
In my opinion, the question is how to keep track of 100 biggest numbers of the huge file / array / database.
When first 100 elements come to the file, we remember each of them in the array of 100 integers, and also keep tracking
of what is MIN and what is MAX.
After all 100 indexes are occupied, we do the following upon the next element coming:
If the current element <= MAX element of the array - do nothig;
If the current element > MAX element of the array:
a). Insert this element into array on the place of MIN element;
b). Recalculate MIN and MAX elements of the array.
This question can be easily solved using min-heap of size 100, initially load any 100 elements of the file into the min-heap then starting from element number 101 to the size of residual file, consider 1 element at a time and check if that element is greater than the minimum element if it is indeed greater then use the extract-min operation on the min-heap which can be done in O(logn) time and insert this new element into the min-heap which gives us our current collection of 100 largest elements and repeat the same procedure for other elements in the file. I hope I am correct in my reasoning :).
That's right. One small correction: doing the heap operations isn't log(n); it's log(100). That is, the cost of the heap operations is proportional to the log of the heap size, not the total array size. This means the complexity of selecting the largest k elements from an array of size n using this method would be O(n log k).
I think you should use min heap of size 100, NOT max heap.
- Yaya January 08, 20121. add first 100 data into the heap
2. pick up a data and compare it with the top one of heap, if the data is greater than the top data of the heap, use it replace the top one and adjust the heap. if the data is less than the top data of heap, ignore it.