Amazon Interview Question
SDE1sCountry: India
Interview Type: In-Person
Does it matter ? :) if you're looking for K-th largest element by partitioning array every time, after finding a K-th largest element, the elements in array after K will be K largest elements. In this caser array=file.
yeah, you are right. It would contain target elements ( but not in sorted order, that was what confused me !! )
But how can you find the
k-th order statistics
externally ? Don't you have to keep in memory the whole file ? I haven't seen the Selection algorithm working externally yet and I couldn't find it,
Thanks
Hi Gevorg.
Have you considered meeting in your suggested solution the constraint
"You dont have RAM to store even k elements."
The K-th order statistics does require at least "K * sizeof(object)" allocated memory. Or you maybe suggesting some implementation that could use file system ?
Ashot, of course I'm suggesting to modify it to work with file system. Something like
1. Choose a pivot element
2. Start reading a file, write elements less than pivot into one file, and elements greater than pivot into another file. Depending on position of pivot (less than or greater than K) recurse into one of output files.
Working with files will make it really slow, but there is no other choice here. The only optimisation we can do here is to read an input file by blocks of maximum size which can fit into memory instead of reading one element at a time.
So, basically idea is same as in external sort, when you have to sort a file which doesn't fit in memory.
Since we do not have enough memory to store the values, we need to use external file to store the numbers.
File will store the k largest numbers.
Every time we read a number from the input file, we need to compare with the already present numbers in the output file then accordingly store or reject this number.
Complexity is O(k*n).
Can we improve the searching in the output file?
Since we can't hold even k elements in memory, another alternative is to use a min-heap of size k/2. In the first pass, keep storing the top k/2 elements in the min-heap and output them at the end of the iteration.
In the second pass, store the next top k/2 elements and output them.
Time complexity would be O(nlg(k/2)). If k is large, then it would make a difference to Subhajit's algorithm. Otherwise, time complexity would be nearly the same as his.
N - size of input
K - size of required set
M - size of memory
- build minheaps of size M, we will have K/M heaps.
- Within memory store the smallest value (say MIN), and a pointer to the minheap (say HP) with this smallest value
- When reading the input if you encounter a value less than MIN, then adjust HP to add the new element - Time complexity log M
- Search through K/M heaps to find new MIN and HP - time complexity log (K/M)
Time complexity will be N (log M + log (K/M))
I do this with MemoryMappedFile in C# if I understand your question right.
MemoryMappedFile mySequentialMMF = MemoryMappedFile.CreateFromFile(fileName, FileMode.Open, "MySequentialMap");
bool IsmutexCreated;
Mutex mymutex = new Mutex(true, "NonPersisterMemoryMappedFilemutex", out IsmutexCreated);
StreamReader sr = new StreamReader(mySequentialMMF.CreateViewStream());
while (!sr.EndOfStream)
{
if (sr.ReadLine() > MAXItem)
{
MAXItem = sr.ReadLine();
}
}
sr.Close();
mymutex.ReleaseMutex();
Ok Amit let's understand what is the difference between min heap and max heap. In the first one uses ascending priority where the smallest item is the first to be popped from the heap. A max heap uses descending priority where the largest item is the first to be popped.
-1 to all commenters in this thread.
1. First of all, the solution posted on top does something weird, but does't solve the problem
2. Max heap will not help here, since problem says that even K elements wouldn't fit in the memory
3. If they would fit, max heap is the solution, not the min heap.
Please do a dry run of your algo using maxheap..you will find why exactly do you need minheap...Nd if you find nothing wrong, write your code here with maxheap. I will give test cases where that will fail :)
Well, assuming that using a heap already violates problem statement (remember, we can't store even K elements in memory ?), my solution is follows (also violates the property that file cannot fit in memory)
1. read all numbers into memory
2. Create MAX HEAP out of that elements (O(N))
3. Attention.... Extract MAX K times !
total complexity – O(N) + O(k*logN), memory - O(N)
Using the min heap you'll use less memory ( O(K) ) but algorithm will be O(N*logK), which is greater than O(k*LogN) if K < N (which obviously is).
Just implementing the K-th order statistic algorithm to work with files will solve this problem. Complexity - O(N)
- gevorgk June 28, 2013