Amazon Interview Question
SDE1sCountry: India
Interview Type: Phone Interview
I actually think it should be a min/max heap. After building the heap of a million stars, each time you insert at the root you'll need to remove the max element.
No, it should be a max heap. If you have a bunch of stars in the heap and new one is coming, say it is the one closest to the earth, which one you replace? It make sense to replace the star that has the maximal distance, because it is no longer useful.
Bad way of doing it is using sort and then checking the first million ones.
A better way is to use the k selection algorithm with a k statistic of 1 million, then just return the stars in the array before index 1 million. O(n)
@SK you mean doing quickselect on the Kth element ? then the left half of the array will have the remaining 1 million nearest ?
Could you pls provide some detail about what exactly you mean by "the k selection algorithm with a k statistic of 1 million" and why is it O(N)?
Could you pls explain what exactly do you mean by "the k selection algorithm with a k statistic of 1 million" and why is it O(N)?
1 billion is a large number to store it in an array So i thought about solution and I found this one out Using Binary Search Tree.
While reading the input I will insert the current number(distance) to the tree.
when the size of the tree is larger than 1 million I will delete the largest one and insert the current element(only if the current element is smaller than the largest one in the tree) otherwise ignore this number.
Finally the tree has the the nearest 1 million stars and I can print them.
while(cin >> n ) {
if(NumberOfElementsInTheTree > int(1e6)) {
if(n > TheLargestElementInTheTree) continue;
else Delete(TheLargestElementInTheTree);
}
Insert(n);
}
I think it might be a valid approach to do a quicksort, but stop after you are sure the first million are sorted. Below is generalized code I wrote to perform a quicksort until the k lowest elements are positioned sorted in between indexes 0 and k-1.
If the whole array of n items was sorted, the average case would be approximately n*log(n), which would be huge for n=1 billion, but I think if you cutted out after the k lowest are sorted it would be n*log(k), which would be better for k=1000000.
I guess for n = 1000000000 and k = 1000000,
nlog(n) ~= 30 billion
nlog(k) ~= 20 billion
so a tapered quicksort would be 33% faster than the full quicksort in this case.
Please do let me know if I did anything wrong here.
public class QuickSort_K {
/**
* QuickSort an array until the k lowest items in the array are located between
* indexes 0 and k.
* @param array array of n items that you wish to know the k lowest items
* @param k the number of items that you wish to know are the lowest
*/
public static void quickSort_K(final int[] array, final int k) {
qSort(array, k, 0, array.length-1);
}
private static void qSort(final int[] array, final int k, final int low, final int high) {
if(k < low) return; //the list is sorted up to the kth element, no need to proceed
if(low < high) {
int p = partition(array, low, high);
qSort(array, k, low, p-1);
qSort(array, k, p+1, high);
}
}
private static int partition(final int[] array, final int low, final int high) {
int pIndex = choosePivot(array, low, high);
int pivotValue = array[pIndex];
swap(array, pIndex, high);
int storeIndex = low;
for(int i = low; i <= high - 1; ++i) {
if(array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, high);
return storeIndex;
}
private static void swap(final int[] array, final int index1, final int index2) {
int temp = array[index1];
array[index1] = array[index2];
array[index2] = temp;
}
private static int choosePivot(final int[] array, final int low, final int high) {
return low + (int) Math.random() * high;
}
//simple testing
public static final void main(String[] args) {
int[] array = {5, 4, 7, 10, 8, 2, 1, 9, 6, 3};
int k = 6;
quickSort_K(array, k);
int max = k;
if (k > array.length) {
max = array.length;
}
for(int i = 0; i < max; ++i) {
System.out.print(array[i] + " ");
}
System.out.println();
}
}
It is a bad idea to load all one billion distances in memory at once. The data will be given in some kind of file stored on the disk, hopefully.
Lets say our memory can load at max 'M' distances. Load the first 'M' distances. Sort them using quicksort. Store it off in file file_1. Repeat for rest of the data.
Now, you have some 'X' files - all sorted.
Perform merge on 'X' sorted files to get the first K elements.
I would propose the solution via heap. Since sorting all stars and keep the first million is not feasible (O N log N) one can use a heap which yields O(N log K) where K is a short list size. Notice that we are not even asked to return these stars in sorted order.
Let's try to design an algorithm:
(i) Use a max heap (or max Priority queue if you will)
(ii) Fill the heap with first one million values
(iii) Comapre each new value with the top of heap
a) if bigger do nothing
b) if smaller replace top of the heap
Finally, the heap contains top 1 million closest stars. A sample code is shown below:
- autoboli April 27, 2015