Flipkart Interview Question for SDE1s


Country: India
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
3
of 5 vote

Do you even need a Data structure for this ? The question says that you want to find the timestamp for which the traffic was maximum. Just keep a max variable and keep on updating it. Whenever max gets updated store the timestamp in a variable.

- Abhi May 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Correct...
Also we can prepare a graph b/w no. of connection to website v/s time for better analysis.

- PKT May 21, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I don't think the solution is complete here in any sense.

- Murali Mohan July 24, 2014 | Flag
Comment hidden because of low score. Click to expand.
1
of 1 vote

Look at the log, count number of rows, group by time stamp and order by time.

- LAMP Guy May 23, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

This is the solution!

- Murali Mohan July 24, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

This is the DB way of doing it. Anyways we never index logs which are semi structured data to DB.

- naveen kumar October 18, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Suggest to use Hash data structure with defined Hash function based on time stamp. So, when we start recording of traffic, hash function map traffic count to defined time stamp. You can easily take max value from Hash table.

- Rajesh Kumar May 21, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Are we not ignoring the concurrency issues ?
To define the real problem, Every User Request will be handled by particular Servlet Thread. There will be parallel updates if we try to update a common variable which stores max request count.

How do we deal with that?
This makes me think How actually Server writes access logs to particular log file?

- Aman May 30, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

if we try to update a common variable which stores max request count.
How do we deal with that?
I think in this case we declare variable as volatile.

- Pardeep Kumar August 28, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Usually websites like Flipkart will have 100s of servers. So what we can do is -
1. Every server we will maintain a map of MAP(Timestamp -> Number of concurrent connections) . This number of concurrent connections can be fetched by using Nagios which runs as a daemon on the server and counts the number of Apache processes in case the server is using apache.
2. Next we can have a worker server which will have a task of iterating through the timestamp and collecting the concurrent connections for that timestamp from all the servers , summing them up and storing it in some variable. Single vairable will give the max number of connections and at what timestamp that happened.

- naveen kumar October 18, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There are multiple ways to solve this problem.

Assumption is its ok for a lag of 1-2 sec.

For extremely large traffic: (This is an example of Live counters)
1. Create a provider and subscriber model (or real-time use RabbitMQ etc). Request goes to the queue in the format of (TimeStamps (epoc), some other info).
2. Listener will be responsible for reading and updating a hashtable like data structure to maintain the count for the Millis.

For large load: and faster search:
1. Use inverted index with count (Lucene index). and update the counter (no concurrency or another bottleneck). or
2. Or use Redis to counter increment for the timestamp key.

- Jamal November 19, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

For such requirements use a DB like cassandra and use the counter for keeping a count based on time stamp. And then have a time window say 2 sec and find the time when traffic was peak.
If you want real time, Try to use maxHeap based on the count stored in cassandra.

- Akshay December 14, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

If you have to do it in production for scaling for petabytes of data and support other use cases as well.
Have a message bus like kafka to collect data. Message format can be something like <timestamp> , <URL>, Hits. Use Storm for processing these messages in realtime. write a storm bolt to process and find current maximum and timestamp associated with it.

- KKK March 31, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More