Let's say a website attracts a

Flipkart Interview Question for SDE1s

0

of 0 votes

13
Answers
Let's say a website attracts a lot of traffic . How would you find at what instant of time (milliseconds or seconds , assume as you want) the traffic was super high ? What data structure would you go for and why ?
- AlgoBaba May 20, 2014 in India | Report Duplicate | Flag | PURGE
Flipkart SDE1 Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

Do you even need a Data structure for this ? The question says that you want to find the timestamp for which the traffic was maximum. Just keep a max variable and keep on updating it. Whenever max gets updated store the timestamp in a variable.

- Abhi May 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Correct...
Also we can prepare a graph b/w no. of connection to website v/s time for better analysis.

- PKT May 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I don't think the solution is complete here in any sense.

- Murali Mohan July 24, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Look at the log, count number of rows, group by time stamp and order by time.

- LAMP Guy May 23, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This is the solution!

- Murali Mohan July 24, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This is the DB way of doing it. Anyways we never index logs which are semi structured data to DB.

- naveen kumar October 18, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Suggest to use Hash data structure with defined Hash function based on time stamp. So, when we start recording of traffic, hash function map traffic count to defined time stamp. You can easily take max value from Hash table.

- Rajesh Kumar May 21, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Are we not ignoring the concurrency issues ?
To define the real problem, Every User Request will be handled by particular Servlet Thread. There will be parallel updates if we try to update a common variable which stores max request count.

How do we deal with that?
This makes me think How actually Server writes access logs to particular log file?

- Aman May 30, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

if we try to update a common variable which stores max request count.
How do we deal with that?
I think in this case we declare variable as volatile.

- Pardeep Kumar August 28, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Usually websites like Flipkart will have 100s of servers. So what we can do is -
1. Every server we will maintain a map of MAP(Timestamp -> Number of concurrent connections) . This number of concurrent connections can be fetched by using Nagios which runs as a daemon on the server and counts the number of Apache processes in case the server is using apache.
2. Next we can have a worker server which will have a task of iterating through the timestamp and collecting the concurrent connections for that timestamp from all the servers , summing them up and storing it in some variable. Single vairable will give the max number of connections and at what timestamp that happened.

- naveen kumar October 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

There are multiple ways to solve this problem.

Assumption is its ok for a lag of 1-2 sec.

For extremely large traffic: (This is an example of Live counters)
1. Create a provider and subscriber model (or real-time use RabbitMQ etc). Request goes to the queue in the format of (TimeStamps (epoc), some other info).
2. Listener will be responsible for reading and updating a hashtable like data structure to maintain the count for the Millis.

For large load: and faster search:
1. Use inverted index with count (Lucene index). and update the counter (no concurrency or another bottleneck). or
2. Or use Redis to counter increment for the timestamp key.

- Jamal November 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

For such requirements use a DB like cassandra and use the counter for keeping a count based on time stamp. And then have a time window say 2 sec and find the time when traffic was peak.
If you want real time, Try to use maxHeap based on the count stored in cassandra.

- Akshay December 14, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If you have to do it in production for scaling for petabytes of data and support other use cases as well.
Have a message bus like kafka to collect data. Message format can be something like <timestamp> , <URL>, Hits. Use Storm for processing these messages in realtime. write a storm bolt to process and find current maximum and timestamp associated with it.

- KKK March 31, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

CareerCup

Flipkart Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews