Goldman Sachs Interview Question
Software Engineer / Developersi think in order to store the points i will use
struct {
int x,y,z;
float distance; // Distance from (0,0,0) by using the above formula
}
will store the data in sorted order based on distance
Any inputs on above approach ??
using octal tree, searching scenario could be very complicated.
if points are very sparse around the base point, numbers of candidate bounding boxes will grow,
if points are very dense around the base point, numbers of points bounding box will grow.
need some consideration about input data traits.
hm.. the interviewee only mentioned about number of point.
if coordination boundary is small enough, just map those points to 3d array, and just search the array with nested loop with distance checking.
and if mentioned threshold(==10 distance) is fixed, build searching address table. this could improve search time performance.
Its best to sort it based on the distance from Orgin and organize the sorted result on a Min heap(probably) and then given a point consider all points with similar distance and among them eleminate those points that are diagonally opposite to given point but have the same distance. but still we may have to consider a lot of points before we can make that elemination.
I have considered this and the more general problem of searching in a metric space in the past. I am pretty sure that most of these solutions here are bad.
If you only have to find the n-nearest neighbors of a few of the points, it would likely be faster to just scan the data linearly and keep track of the n closest seen neighbors.
If the lookup is performed many times, the points should be put into some spatial data structure. You could map them into a 2-D array, a 3-D array, or you could insert them all into a oct-tree. The benefits of these are that you can apply the triangle inequality to eliminate large amounts of points from consideration when looking for the n nearest neighbors. The choice of spatial data structure will necessarily depend upon the distribution of the points to the 3-d space.
The idea of using a heap is essentially a spatial data structure, where the locations can be compared using their distance from the origin. The triangle inequality will allow filtering out of a bunch of points, but since we are in a 3-d domain, a 3-d data structure can possibly perform better than a 1-d structure like a single heap.
Isaac Dooley is mostly right. Why would you sort a million records if all you need is the 10 closest?
To find the 10, scan the data sequentially. Calculate the first ten distances. Then find the max distance among these ten and set variable max_d to it. From this point on any point whose distance is better than max_d will replace the point whose distance is max_d in our list of ten. Then max_d is recalculated and the process continues. To further speed up the process, for a point whose x is more than a+max_d or whose y is more than b+max_d or whose z is more than c+max_d we will not even bother calculating the distance because it will certainly not be below max_d.
As for the best way to store the data, the information given is incomplete. It depends on how I intend to use this data in the future. For the calculation of the ten closest I would only store the ten in a 10x4 array (or an array of size 10 of a struct like the one defined by sach above) and not keep the other million entries in memory.
since we have to find all points within a distance of 10 from a given point say A(x,y,z) we will need to find all points having thr co-ordinates within a range of x-10 to x+10;y-10,y+10;z-10,z+10 .after that we can seleclt from those handful of points those who r inside the sphere.
So the data structure should be such that we can search for a range of values of x,y,z pair which can be efficietly done using a btree or b+ tree index using xyz as a key to search for points.time complexity will be O(logn);
Mayank, Problem is find 10 closest points to a point (x,y,z) and not points at a distance of 10.
@others, as I understand using a heap data structure is a solution, What would be the complexity of the program? (I am new to this D.S)
Help me if I am wrong,
My Understanding of algo
1. For every point, calculate the distance with the reference point and using this distance as the key, insert it in a heap D.S (but where would one store the actual point corresponding to that key?)
2. traverse the tree breadth wise from the bottom of the tree upto 10 elements.
I believe this code should work...
#include <iostream>
#include <fstream>
#include <sstream>
#include <map>
#include <vector>
#include <string>
#include <stdlib.h>
#include <math.h>
using namespace std;
struct data
{
double x;
double y;
double z;
};
typedef multimap<double,data> mymap;
int main()
{
string s;
data dt;
mymap calc;
ifstream ifs("points.txt");
if (!ifs.is_open()) {
cout << "count not open file points.txt for reading" << endl;
exit(1);
}
int a=5,b=6,c=3;
cout << "Sides: " << a << " " << b << " " << c << endl;
while( ifs >> s )
istringstream buf(s);
string token;
vector<int> v;
for(int i; getline(buf, token, ','); ++i )
v.push_back(atoi(token.c_str()));
dt.x=v[0];
dt.y=v[1];
dt.z=v[2];
double d = sqrt(pow(dt.x-a,2)+pow(dt.y-b,2)+pow(dt.z-c,2));
calc.insert(make_pair(d, dt));
}
cout << "Closest elements:" << endl;
mymap::const_iterator iter=calc.begin();
for(int i=0; i<10; ++i)
{
data d1=iter->second;
cout << iter->first << " \t " << d1.x << "," << d1.y << "," << d1.z << endl;
++iter;
}
return 0;
}
Just quickselect the 10th value, where comparisons throughout the quickselect procedure use distance to the point as the comparator. this is O(N) time, O(1) space, but it mutates the array.
Sorted and using binary search
- Anonymous March 06, 2010