AppNexus Interview Question
Software EngineersCountry: United States
Interview Type: Written Test
You need two tables to answer those two questions:
First table uses user_id as the partition key, time as the cluster key ordered descending, and location key as the value. The record should be inserted with a TTL that meets the 2h limit. A new location will add a new entry to this partition with a new/top timestamp. Since it is time-series data with the same TTL, the older ones will always expire first.
At the same time, you insert an entry that is partitioned on location_id, that is clustered on user_id. Each insert will either add a new entry or will replace an existing entry with a new TTL (still within 2 hr from the moment of check in). Optional to record check-in time in this case.
There are 2 queries in front of us, ignore the constraints for a while.
- Balaji June 29, 2017To answer query 1 - Simple redis with key being user unique id and value being the geo location is enough. O(1)
Query 2 - For any given location return user in that location. Can be achieved by partitioning the users based on the location, while inserting. Whenever location query is made, simply find the partitions and return all the users in it.
To extend and support constraints.
1. If the data is huge, shard the users based on location and store in multiple replicas. If too many users checks in, at a particular location have more buckets for that location. Say limit is 1000 users/replica.
2. If the check-in query increases by 10k, have a config servers, which stores the replica location for the given pair and store it there.
3. If based on location fetch query increases. We can launch parallel process based on number of buckets which has the required data and list out.
If based on location fetch query increases for a particular region. The replicas of the bucket would help resolving.
4. What if the location fetch requires data from multiple location bucket. Say its in intersection of multiple, here it would require checking user location in the bucket before returning.
Except case 4 all other requires constant retrieval time. Open for critics.