A point in 3-d is defined by (

Goldman Sachs Interview Question for Software Engineer / Developers

0

of 0 votes

20
Answers
A point in 3-d is defined by (x,y,z). Distance d between any two points (X,Y,Z) and (x,y,z) is d= Sqrt[(X-x)^2 + (Y-y)^2 + (Z-z)^2].
Now there are a million entries in a file, each entry is some point in space, in no specific order. Given any point (a,b,c) find the nearest 10 points to it. How would you store the million points and how would you retrieve those 10 points from that data structure.
- chethan March 06, 2010 | Report Duplicate | Flag | PURGE
Goldman Sachs Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

Sorted and using binary search

- Anonymous March 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Using what order, sir?

- Anonymous March 06, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

i think in order to store the points i will use

struct { 
             int x,y,z;
             float distance; // Distance from (0,0,0) by using the above formula
            }

will store the data in sorted order based on distance

Any inputs on above approach ??

- sach March 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How do you find the 10 points closest to (a,b,c)?

- Anonymous March 06, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The interviewer said Heap Data structure was the way to go. I am still realising it.

- xankar March 07, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I've written code to accomplish this very task. Your interviewer is wrong.

- Anonymous March 07, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

ok. can you please post the code with some details?

- xankar March 07, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Min Heap

- Anonymous March 07, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

linear structure -> binary search(tree)
pair structure -> quad tree
3d structure -> octal tree?

- Zaphod March 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

using octal tree, searching scenario could be very complicated.

if points are very sparse around the base point, numbers of candidate bounding boxes will grow,
if points are very dense around the base point, numbers of points bounding box will grow.

need some consideration about input data traits.

- Zaphod March 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

hm.. the interviewee only mentioned about number of point.

if coordination boundary is small enough, just map those points to 3d array, and just search the array with nested loop with distance checking.

and if mentioned threshold(==10 distance) is fixed, build searching address table. this could improve search time performance.

- Zaphod March 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Its best to sort it based on the distance from Orgin and organize the sorted result on a Min heap(probably) and then given a point consider all points with similar distance and among them eleminate those points that are diagonally opposite to given point but have the same distance. but still we may have to consider a lot of points before we can make that elemination.

- Harish March 18, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I have considered this and the more general problem of searching in a metric space in the past. I am pretty sure that most of these solutions here are bad.

If you only have to find the n-nearest neighbors of a few of the points, it would likely be faster to just scan the data linearly and keep track of the n closest seen neighbors.

If the lookup is performed many times, the points should be put into some spatial data structure. You could map them into a 2-D array, a 3-D array, or you could insert them all into a oct-tree. The benefits of these are that you can apply the triangle inequality to eliminate large amounts of points from consideration when looking for the n nearest neighbors. The choice of spatial data structure will necessarily depend upon the distribution of the points to the 3-d space.

The idea of using a heap is essentially a spatial data structure, where the locations can be compared using their distance from the origin. The triangle inequality will allow filtering out of a bunch of points, but since we are in a 3-d domain, a 3-d data structure can possibly perform better than a 1-d structure like a single heap.

- Isaac Dooley May 16, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Isaac Dooley is mostly right. Why would you sort a million records if all you need is the 10 closest?

To find the 10, scan the data sequentially. Calculate the first ten distances. Then find the max distance among these ten and set variable max_d to it. From this point on any point whose distance is better than max_d will replace the point whose distance is max_d in our list of ten. Then max_d is recalculated and the process continues. To further speed up the process, for a point whose x is more than a+max_d or whose y is more than b+max_d or whose z is more than c+max_d we will not even bother calculating the distance because it will certainly not be below max_d.

As for the best way to store the data, the information given is incomplete. It depends on how I intend to use this data in the future. For the calculation of the ten closest I would only store the ten in a 10x4 array (or an array of size 10 of a struct like the one defined by sach above) and not keep the other million entries in memory.

- Mohammad June 10, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Boss,you have to search for max element in this array......why don't you use heap structure for this....just think abt it.

- Nits July 14, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

since we have to find all points within a distance of 10 from a given point say A(x,y,z) we will need to find all points having thr co-ordinates within a range of x-10 to x+10;y-10,y+10;z-10,z+10 .after that we can seleclt from those handful of points those who r inside the sphere.
So the data structure should be such that we can search for a range of values of x,y,z pair which can be efficietly done using a btree or b+ tree index using xyz as a key to search for points.time complexity will be O(logn);

- mayank July 23, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Mayank, Problem is find 10 closest points to a point (x,y,z) and not points at a distance of 10.

@others, as I understand using a heap data structure is a solution, What would be the complexity of the program? (I am new to this D.S)

Help me if I am wrong,
My Understanding of algo

1. For every point, calculate the distance with the reference point and using this distance as the key, insert it in a heap D.S (but where would one store the actual point corresponding to that key?)

2. traverse the tree breadth wise from the bottom of the tree upto 10 elements.

- Anonymous September 29, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I believe this code should work...

#include <iostream>
#include <fstream>
#include <sstream>
#include <map>
#include <vector>
#include <string>
#include <stdlib.h>
#include <math.h>

using namespace std;

struct data
{
        double x;
        double y;
        double z;
};

typedef multimap<double,data> mymap;

int main()
{
        string s;
        data dt;
        mymap calc;
        ifstream ifs("points.txt");
        if (!ifs.is_open()) {
                cout << "count not open file points.txt for reading" << endl;
                exit(1);
        }
        int a=5,b=6,c=3;
        cout << "Sides: " << a << " " << b << " " << c << endl;
        while( ifs >> s )

                istringstream buf(s);
                string token;
                vector<int> v;
                for(int i; getline(buf, token, ','); ++i )
                        v.push_back(atoi(token.c_str()));
                dt.x=v[0];
                dt.y=v[1];
                dt.z=v[2];
                double d = sqrt(pow(dt.x-a,2)+pow(dt.y-b,2)+pow(dt.z-c,2));
                calc.insert(make_pair(d, dt));
        }

        cout << "Closest elements:" << endl;
        mymap::const_iterator iter=calc.begin();
        for(int i=0; i<10; ++i)
        {
                data d1=iter->second;
                cout << iter->first << " \t " << d1.x << "," << d1.y << "," << d1.z << endl;
                ++iter;
        }
        return 0;
}

- Mike March 19, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Just quickselect the 10th value, where comparisons throughout the quickselect procedure use distance to the point as the comparator. this is O(N) time, O(1) space, but it mutates the array.

- anon February 04, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry and to clarify:

after the quickselect process, everything to the left of the 10th value is smaller. so values 1-10 in the array are the 10 closest points.

- anon February 04, 2014 | Flag

CareerCup

Goldman Sachs Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews