Amazon Interview Question
Software Engineer / DevelopersCountry: India
I'm confused. Why min heap and removing the root with the smallest distance? Don't we want the K co-ordinates with the smallest distances? Shouldn't we do this:
Keep a size K max heap, and if current co-ordinates' distance is smaller than root (which is the Kth smallest distance seen so far), we know the root does not represent the Kth smallest anymore, we remove root and insert the the distance of the current co-ordinates, and root of our max heap will represent the Kth smallest distance again.
While we maintain the max heap, we should also associate each co-ordinates discovered with the heap node that represents their distance from origin, probably using a linked list. So that after we have traversed all the co-ordinates, at the top of our max heap is the node representing the Kth smallest distance, and we have a linked list of co-ordinates that are that far from origin, and we output those, because there can be multiple of them.
assuming memory is not a problem:
for all given points calculate: x^2+y^2;
Store the result in a map<coordinates,distance>;
sort by distance;
Simple and efficient idea.
We can use min heap for maintaining the k-largest elements (x^2+y^2 values) for reducing the space complexity.
1. When the question specifies it's a large file, you have to assume it doesn't fit in memory.
2. Why do you need to sort all the item if you only want the top K points ? You can find the top K without sorting everything.
Bad solution. Very surprised about the positive votes.
As stated above you can use a min heap of size K which will store the top K points, and compare the distance of each point read from the file with the point with the minimum distance in the heap (root of the heap), and replace the root if the current point has longer distance.
@prithvi why the heck use a map ??? ,here is no need to perform lookup by key here.
Also you cant sort a hashtable, so you better just use a regular array. Unless you mean a sorted map such as a TreeMap, but then you wouldnt need to sort it in the end because its already sorted....
Anyway your answer is just the worst solution Ive seen in a long time, and a definite rejection by interviewer.
The answer i gave was the 1st thing that came to my mind. its is not a very good answer after i made the assumption of memory.
To ans gen-y-s: A map to keep track of distance and corresponding coordinates. (i think a Treemap is sorted by key. A tree map requires a value comparator. Am i wrong?)
i agree Sorting the coordinates by the distance in array. i hope you are not taking about sorting of distances alone .
But corsider more than 1 point at same distance.
defining heapsize k might not be good enough:
Consider a case where there are 1 million points, its possible that there are 1million -k cordinates that are at k farthest from origin.
Create a data structure to hold the x,y, coordinates and the distance squared from origin. At this point read in all data from the file into the data structures and store them in a dynamically resizing array that doubles when the capacity is reached (subsequently it would divide itself in half when it is at 1/4 capacity). The data would be merge sorted by distance squared from origin in ascending order. At this point, you can locate k number of farthest points by starting at indexlength - k and iterating forward to the end.
This should ensure that as long as memory is not a problem you get linearithmic efficiency on the sorting, linear time on the data insertion, and near constant time on retrieving k nodes.
Should memory become an issue, then you would need to implement the data in a linked list to be able to perform the merge sort in place.
Please correct me if I am wrong.
1. with group of points, we can make a convex hull. the complexity is O(km). I forgot what is k. m is the muber of points on the hull. But you can find the answer from geometry algorithem book.
2. the farthest point is the point on the hull.
3. calculate all distance of points on the hull. find the largest one. it will take O(m)
4. remove the largest distance point. rebuild hull. it will take constant cost C (you do not need to rebuild the whole hull)
5. calculate the distance of new added points it will take almost constant cost C.
6. remove the largest point, do 4 and 5 until you find the kth largest point.
I like the idea of trying to apply this algorithm here. Very unique. I give you a thumbs up for creativity. But consider the below point:
Convex hull doesn't necessarily translate into 'furthest point'. I could have an edge that is only a distance of 1 away from the center and the opposite corner with a distance of 100. Any points between a distance of 1 and the opposite corner would be 'further away' than the one point on the convex hull that is near the origin.
Using a max priority queue of k items.
1. Create a function which gives the distance (square_root( square(x1) + square(y1)). The complexity of function square_root is Olog(n). where n = sum of square or x1 and y1
1. Initially the queue is empty insert k items directly into the priority queue O(klogk). k distances.
2. Now for remaining entries you can simply check with the max element if the new item is less than max remove max and insert that item. Do this for every remaining item in the file. O(nlogk).
3. At the end you'll have k smallest items remaining in the priority queue. The space complexity for this solution is O(k)
Please correct me if anything is wrong.
Two solution possible depending on situation:
Solution 1: More time complexity and less space complexity
Use Max-Heap of size k.
Time complexity: O(n*logk)
Space complexity: O(k)
Solution 2: Same time complexity and more space complexity
Use Map<Cordinate,distance> of size n.
Time complexity: O(n)+O(nlogk)
Space complexity: O(n)
void Comparator( std::map<float, point_t*>& arr, const float data, point_t* file, const size_t k, bool op )
{
if( k > arr.size() ) // collect first k values
{
arr[ data ] = file;
return;
}
std::map<float, point_t*>::iterator iter = op ? std::prev( arr.end() ) : arr.begin();
if( op ? (*iter).first > data : data > (*iter).first )
{
arr[ data ] = file;
if( k < arr.size() ) // pull out wrong value
{
op ? arr.erase( std::prev( arr.end() ) ) : arr.erase( arr.begin() );
}
}
}
int test17900686()
{
const size_t k = 3;
point_t largeFile[] = { {1,2}, {3,4}, {5,6}, {7,8}, {9,1}, {4,5}, {3,5}, {2,6}, {2,7}, {1,4}, {7,8}, {3,7}, {8,3}, {4,9}, {4,3}, {8,3}, {2,2} };
std::map<float, point_t*> Xs, Xl, Ys, Yl; // k smallest and largest
// collect 4 * k lowest and largest coordinates
for( size_t i = 0; i < _countof( largeFile ); i++ )
{
Comparator( Ys, largeFile[i].y, &largeFile[i], k, true );
Comparator( Xs, largeFile[i].x, &largeFile[i], k, true );
Comparator( Yl, largeFile[i].y, &largeFile[i], k, false );
Comparator( Xl, largeFile[i].x, &largeFile[i], k, false );
}
Xs.insert(Xl.begin(), Xl.end()); // joint biggest and smallest X
Ys.insert(Yl.begin(), Yl.end()); // joint biggest and smallest Y
std::map<float, pare_t> dist;
// all combinations of k points
for( std::map<float, point_t*>::iterator iterX = Xs.begin(); iterX != Xs.end(); ++iterX )
{
for( std::map<float, point_t*>::iterator iterY = Ys.begin(); iterY != Ys.end(); ++iterY )
{
if( (*iterX).second == (*iterY).second ){ continue; } // skip the same points
float d1 = ::fabs((*iterX).first - (*iterY).second->x);
float d2 = ::fabs((*iterX).second->y - (*iterY).first);
float distance = d1*d1 + d2*d2;
pare_t pare;
pare.first = (*iterX).second;
pare.second = (*iterY).second;
if( k < dist.size() && (*std::prev( dist.end() )).first > distance ) // skip shortest distances
{
continue;
}
dist[ distance ] = pare;
}
}
std::map<float, pare_t>::reverse_iterator iter = dist.rbegin();
float result[k] = {0};
for( size_t i = 0; i < k && iter != dist.rend(); ++iter, i++ )
{
pare_t pare = (*iter).second;
result[i] = ::sqrt((*iter).first);
std::cout << result[i] << " XY0:" << pare.first->x << "-" << pare.first->y << " XY1:" << pare.second->x << "-" << pare.second->y << std::endl;
}
float expected[k] = {9.43398, 8.544, 7.07107}; // 9-1 : 4-9, 1-4 : 9-1, 1-4 : 8-3
return ::memcmp( expected, result, sizeof(result) );
}
Calculate Eucledian distance of all points from origin. Make a heap of these distances. Remove first k elements from heap and print them.
Otherwise, add all distances to a balanced binary search tree. In each node, distance will act as key and value will be the id of the point. Traverse post order and print first k values.
The key is in "large file". Most likely it is the problem which would mean no storing or creating big sophisticated structures. I think what needs to be addressed is that while going along the file to process distances the once that are candidates (initial k) need to be verified if they still are the longest. The storage then for those is k-elements. No need for big data structure - just sequential read from file wherever it is. It is half practical half thoretical problem. It sounds like min tree of k-elenents could be more sophisticated, but it could be any small k-element structure as initiial insophisticated approach. Then you go levels of sophistication and improvement with analysis.
k farthest points from origin form a circle of radius k. Hence the values |x| and |y| can be in range from 0 to k, with special cases when |x| = k, then y=0, and viceversa.
Hence looping through all values , we can consider only those pairs which have |x|<= k, |y|<=k and |y|^2 = |k|^2-|x|^2.
this takes time O(n)
Please correct me if I am wrong. The time complexity will be high
We can calculate and feed the square of the distance of the 1st co-ordinate from the origin in a linked list.
Calculate the same for the rest of the co-ordinates and that with a greater distance can over right the linked list data, after comparison each time.
Reason to use a linked list is to store all the co-ordinates at the same distance.
when another co-ordinate with a higher value of distance is encountered, all the values in the linked list will be deleted and the current co-ordinate values will be kept in the head of the linked list.
Use min heap of size k.
- Anonymous May 14, 20131. Build min heap with first k co-ordinates distance from origin.
2. Then from then on, compare the root's value with next co-ordinates distance, if root is smaller delete root and insert the new distance.