Google Interview Question
Software Engineer in TestsInterview Type: In-Person
EDIT: If further asked about database implementation. Let say you have all the distances how would you find top 10.
use batch for finding top 10, divide the whole data in the set of 20 each(keep the last one of whatever size you have). Find top 10 in each..drop the rest from each group and so on.....
In general, when you have N points and you want to find the closest M points to a special point P, you:
1) Create a min heap of size M
2) Loop through the N points and add them to the min heap based on the comparison X < Y when distance(X,P) < distance(Y,P).
3) Print the min heap in increasing order.
The total run time is O(N*lg(M) + M*lg(M)) == O(N*lg(M)). In this special case, you would need to determine how to get the N points in the first place. If hotels are stored (and indexed) by lat/long coordinates, you would likely query based on a circle C with center P... i.e. get hotels X where
(X.lat - P.lat)^2 + (X.long - P.long)^2 < R^2
Depending on the database you use, you can likely order order query results by distance to P and limit to M results.
If querying based on a distance function is not possible, you can simplify the query to use a square S with center P having only inequalities:
(P.lat - D/2) < X.lat < (P.lat + D/2) && (P.long - D/2) < X.long < (P.long + D/2)
and then run the algorithm above using the heap. But there is an issue: it is possible for S to contain N points, but not the *closest* M points to the center P. For example, S could have a hotel on one of the corners but there could be a closer hotel outside S very close to one of the sides of S. Here you would need to choose an extra large square (whatever that means) to ensure you have the correct set of points.
If you are able to design the app yourself, then you can likely choose a database which makes the query simple and avoid any post processing of the query results.
The following is my idea. I am not sure if this is right. If the distance to hotels is given in the form array. Then take the 1st ten distance and sort them in place. Keep a max value which has the max value from first 10. Start a loop from the 11th distance and if that distance is less than the max then insert this value in the correct position in the first 10 elements and find the new max. By iterating through the loop once we will get closest 10 in the first 10 place of the array.
The following is my idea. I am not sure if this is right. If the distance to hotels is given in the form array. Then take the 1st ten distance and sort them in place. Keep a max value which has the max value from first 10. Start a loop from the 11th distance and if that distance is less than the max then insert this value in the correct position in the first 10 elements and find the new max. By iterating through the loop once we will get closest 10 in the first 10 place of the array.
Store hotels within smaller geo-fences. Find which geo-fence the user is in and search the top 10 closest hotels by processing all the hotels in the geo-fence. If there is not enough hotels for 10, then search neighbouring geo-fences. The performance of this really depends on how you define your geo-fences. Maybe define the geo-fences dynamically based on the density of hotels within that region. Since hotels are not added frequently this can probably be just a manual process.
This is a problem which can be solved by MapR.
Step 1: Get the g = Geo Location of the User O (1)
Step 2: Design Mapper as: Load all Hotels from a DB/File or remote stream; Split it and record <{Hotel_Name, Address}, {Geo Location}> as <key, value> pair.
Step 3: Design Reducer as Follows: Create a TreeMap (or Hash Map). For every <K, V>, compute the d = distance/displacement from 'g' with the values in V.
If the hashmap size is less than 10, add it to the map as <K, V>. Else,
for each <K, V> in HashMap, if distance (g, V) > d; replace <K, V> with current <Key Value> pair.
Override the cleanup method to throw the contents of the HashMap
Here is my take on this problem. I was asked same question in my interview with Goldman Sachs, and the interviewer was satisfied with my answer.
- MUPala June 25, 2015This question is not about algorithm. It is about designing a system(an application). So, it is more than just a algorithm.
First how to organize data. In all the earlier posts people talk about maintaining min heap and tree. Well you cannot load position of all the hotels in the world in you RAM to process for the nearest hotels. You need to persist data in a large database and use DBMS to exploit query stats and abstraction for this. There is no point in loading and organizing data in your RAM(even if you want to do it how would you organize it? 2D Grid? Most of the information is not useful!!! you need some kind of filtering)
Abstraction:Lets say, If a location is around new york(can be decide based on GPS in o(1)) then query the DATABASE VIEW(read sql for what is VIEW) for NEW york. the query would be to calculate distance sqrt(x*x+y*y) (please note: no need of displacement, can be discussed with interviewer). Sort the results and return top 10 from table.
There can be different abstraction on same database or different database for different countries(this is the approach used in big application to organize data on multiple systems)