Flipkart Interview Question
SDE1sCountry: India
Interview Type: Phone Interview
Correct...
Also we can prepare a graph b/w no. of connection to website v/s time for better analysis.
Are we not ignoring the concurrency issues ?
To define the real problem, Every User Request will be handled by particular Servlet Thread. There will be parallel updates if we try to update a common variable which stores max request count.
How do we deal with that?
This makes me think How actually Server writes access logs to particular log file?
Usually websites like Flipkart will have 100s of servers. So what we can do is -
1. Every server we will maintain a map of MAP(Timestamp -> Number of concurrent connections) . This number of concurrent connections can be fetched by using Nagios which runs as a daemon on the server and counts the number of Apache processes in case the server is using apache.
2. Next we can have a worker server which will have a task of iterating through the timestamp and collecting the concurrent connections for that timestamp from all the servers , summing them up and storing it in some variable. Single vairable will give the max number of connections and at what timestamp that happened.
There are multiple ways to solve this problem.
Assumption is its ok for a lag of 1-2 sec.
For extremely large traffic: (This is an example of Live counters)
1. Create a provider and subscriber model (or real-time use RabbitMQ etc). Request goes to the queue in the format of (TimeStamps (epoc), some other info).
2. Listener will be responsible for reading and updating a hashtable like data structure to maintain the count for the Millis.
For large load: and faster search:
1. Use inverted index with count (Lucene index). and update the counter (no concurrency or another bottleneck). or
2. Or use Redis to counter increment for the timestamp key.
If you have to do it in production for scaling for petabytes of data and support other use cases as well.
Have a message bus like kafka to collect data. Message format can be something like <timestamp> , <URL>, Hits. Use Storm for processing these messages in realtime. write a storm bolt to process and find current maximum and timestamp associated with it.
Do you even need a Data structure for this ? The question says that you want to find the timestamp for which the traffic was maximum. Just keep a max variable and keep on updating it. Whenever max gets updated store the timestamp in a variable.
- Abhi May 20, 2014