Bankbazaar Interview Question

Country: India
Interview Type: Phone Interview

mmm.. I think the answer is depends!

Depends on the type of query and the type of data.

Although I do agree that one of the solutions could be using a map reduce approach.

But there could be different approaches:
#1 Doing Map Reduce with multiple servers with a clustering to optimize the query.
#2 In memory write read-write through cache
#3 Common data could be duplicated across many records optimizing multiple sub-queries.

#1 Map Reduce:
A Query system could be brute force querying every server and get all of them return whatever data was found you reduce this through an aggregation function which can be improve by doing a clustering scheme to just map the query to certain servers.

#2 In memory read-write through cache
This approach of query on big system is crucial to have responsiveness. Where you actually always query the cache then the cache queries the underlying storage.

#3 Common Data
Also lets say the query just get a very limited scope but data need to retrieved from multiple places.

We could scarify consistency as social media is not critical like banks where all operations dealing with money need to be atomic.

For example: User, Has Friends so when the user logs needs to see their friends Names, DOB or any other normally consistent data.

So the data could be setup where for an User record to contain a cache version the friends Names, DOB of all his friends. This means data is heavily replicated and it is fine if is not up to date. So queries become sort of a hashtable lookup where you map using the hashcode the machine and get the record on the machine.

- Nelson Perez March 24, 2015 | Flag Reply

