there are 1 billion stars and

Amazon Interview Question for SDE1s

1

of 3 votes

20
Answers
there are 1 billion stars and you are standing at the earth. From the earth , you know the distance of all 1billion stars. Find the nearest 1 million stars from the earth.
- st2581ag10 April 27, 2015 in India | Report Duplicate | Flag | PURGE
Amazon SDE1 Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 10 vote

I would propose the solution via heap. Since sorting all stars and keep the first million is not feasible (O N log N) one can use a heap which yields O(N log K) where K is a short list size. Notice that we are not even asked to return these stars in sorted order.

Let's try to design an algorithm:
(i) Use a max heap (or max Priority queue if you will)
(ii) Fill the heap with first one million values
(iii) Comapre each new value with the top of heap
a) if bigger do nothing
b) if smaller replace top of the heap
Finally, the heap contains top 1 million closest stars. A sample code is shown below:

public Star[] findClosest(Star[] sky, int K) {
	PriorityQueue<Star> pq = new PriorityQueue<Star>();

	int N = sky.length;
	if (N <= K) 	return sky;

	for (int k = 0; k < k; k++)
		pq.add(sky[k]);

	for (int k = K; k < N; k++) {
		Star mystar = sky[k];
		if ( less(mystar, pq.peek()) ) {
			pq.remove();
			pq.add(mystar);
		}
	}

	return(Star[]) pq.toArray();
}

private boolean less(Comparable a, Comparable b) {
	return a.compareTo(b) < 0;
}

public class Star implements Comparable<Star> {
	public int id;
	public float dst;

	public int compareTo(Star other) {
		if (this.dst > other.dst) 	return 1;
		if (this.dst < other.dst) 	return -1;
		return 0;
	}
}

- autoboli April 27, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 4 votes

I think it should be Min Heap and this will change your algorithm a bit.

- Atul April 28, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I actually think it should be a min/max heap. After building the heap of a million stars, each time you insert at the root you'll need to remove the max element.

- Alex April 28, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 2 votes

No, it should be a max heap. If you have a bunch of stars in the heap and new one is coming, say it is the one closest to the earth, which one you replace? It make sense to replace the star that has the maximal distance, because it is no longer useful.

- autoboli April 28, 2015 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

This doesn't work. Say the first million are sorted from 2 to 3 million. Next million are sorted from 1-2 million. This would return 2-3 million.

- Dave May 05, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Bad way of doing it is using sort and then checking the first million ones.

A better way is to use the k selection algorithm with a k statistic of 1 million, then just return the stars in the array before index 1 million. O(n)

- SK April 27, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@SK you mean doing quickselect on the Kth element ? then the left half of the array will have the remaining 1 million nearest ?

- guilhebl April 27, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Could you pls provide some detail about what exactly you mean by "the k selection algorithm with a k statistic of 1 million" and why is it O(N)?

- undefined April 27, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Could you pls explain what exactly do you mean by "the k selection algorithm with a k statistic of 1 million" and why is it O(N)?

- Anonymous April 27, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

yes, guilhebl.

- SK April 28, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Very good, but can you avoid using 1 billion memory?

- CT April 29, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

may be u can iterate that a billion stars, to collect min million distance start with a stack, that stack is you want.

- dirk April 27, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1 billion is a large number to store it in an array So i thought about solution and I found this one out Using Binary Search Tree.
While reading the input I will insert the current number(distance) to the tree.
when the size of the tree is larger than 1 million I will delete the largest one and insert the current element(only if the current element is smaller than the largest one in the tree) otherwise ignore this number.
Finally the tree has the the nearest 1 million stars and I can print them.

while(cin >> n ) {
   if(NumberOfElementsInTheTree > int(1e6)) {
       if(n > TheLargestElementInTheTree) continue; 
       else Delete(TheLargestElementInTheTree); 
   }
   Insert(n); 
}

- M.Sayed April 28, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

can u provide solution for this problem in C language

- Codie April 29, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can we use Graph and then apply djisktra algorithm?

- Naveen May 04, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Does using a graph and then applying Djikstra Algorithm help?

- Navi May 04, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Consider the size of data is quite large,maybe we can assume that they are distributed evenly and use bucket sort to count the # of stars in each interval. Repeat the sorting process for the correct interval until we reach 1 million.

- RFYRFYRFY May 09, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think it might be a valid approach to do a quicksort, but stop after you are sure the first million are sorted. Below is generalized code I wrote to perform a quicksort until the k lowest elements are positioned sorted in between indexes 0 and k-1.

If the whole array of n items was sorted, the average case would be approximately n*log(n), which would be huge for n=1 billion, but I think if you cutted out after the k lowest are sorted it would be n*log(k), which would be better for k=1000000.

I guess for n = 1000000000 and k = 1000000,
nlog(n) ~= 30 billion
nlog(k) ~= 20 billion
so a tapered quicksort would be 33% faster than the full quicksort in this case.

Please do let me know if I did anything wrong here.

public class QuickSort_K {

	/**
	 * QuickSort an array until the k lowest items in the array are located between 
	 * indexes 0 and k. 
	 * @param array array of n items that you wish to know the k lowest items
	 * @param k the number of items that you wish to know are the lowest
	 */
	public static void quickSort_K(final int[] array, final int k) {
		qSort(array, k, 0, array.length-1);
	}
	
	private static void qSort(final int[] array, final int k, final int low, final int high) {
		if(k < low) return; //the list is sorted up to the kth element, no need to proceed
		
		if(low < high) {
			int p = partition(array, low, high);
			qSort(array, k, low, p-1);
			qSort(array, k, p+1, high);
		}
	}
	
	private static int partition(final int[] array, final int low, final int high) {
		int pIndex = choosePivot(array, low, high);
		int pivotValue = array[pIndex];
		
		swap(array, pIndex, high);
		int storeIndex = low;
		
		for(int i = low; i <= high - 1; ++i) {
			if(array[i] <= pivotValue) {
				swap(array, i, storeIndex);
				storeIndex++;
			}
		}
		swap(array, storeIndex, high);
		
		return storeIndex;
	}
	
	private static void swap(final int[] array, final int index1, final int index2) {
		int temp = array[index1];
		array[index1] = array[index2];
		array[index2] = temp;
	}
	
	private static int choosePivot(final int[] array, final int low, final int high) {
		return low + (int) Math.random() * high;
		
	}
	
	//simple testing
	public static final void main(String[] args) {
		int[] array = {5, 4, 7, 10, 8, 2, 1, 9, 6, 3};
		int k = 6;
		quickSort_K(array, k);
		
		int max = k;
		if (k > array.length) {
			max = array.length;
		}
		
		for(int i = 0; i < max; ++i) {
			System.out.print(array[i] + " ");
		}
		System.out.println();	
	}

}

- AJ May 18, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If distances can be taken as 32bit integers (say in light years) then LSD Radix sort (maybe with a first pass of MSD Radix Sort) could be a good way to go about things unless memory is also a constraint. In that case external merge sort should be fine.

- Nilaksh Bajpai May 23, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

It is a bad idea to load all one billion distances in memory at once. The data will be given in some kind of file stored on the disk, hopefully.

Lets say our memory can load at max 'M' distances. Load the first 'M' distances. Sort them using quicksort. Store it off in file file_1. Repeat for rest of the data.

Now, you have some 'X' files - all sorted.

Perform merge on 'X' sorted files to get the first K elements.

- visionary June 16, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Sorting the distances in ascending can be a solution and selecting first million distances is a possible way. But the limit is quite large in this case.

- Kiran Vadakath April 27, 2015 | Flag Reply

CareerCup

Amazon Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews