Ebay Interview Question
SDE-2sTeam: Traffic
Country: United States
Interview Type: In-Person
Thats a good start, it will be good if you can talk about interaction (sequence diag) from end to end for both the put and get scenario
Thats a good start, it will be good if you can talk about interaction (sequence diag) from end to end for both the put and get scenario
is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.
CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.
Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.
Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.
@juny this is one of those open-ended questions where there can be a few good approaches instead of one single correct answer.
- DevGuy January 24, 2014Some points that I can see right now,
1) There are lots of events per day, 1billion events/86400sec = O(10^4) events/sec. We should be able to handle this much load.
2) Since nothing is said about the distribution of events in the whole day, there can be dull periods and then big spikes in the events coming in. We should be able to handle these spikes gracefully.
3) The data structure that we choose has to give good performance against the requirement of giving most recent events in a given time period (like last 60 secs here).
4) Holding 1 billion events in a single machine's RAM may not be feasible if individual events themselves are not small. Even if they are small, there could be requirement to fetch data for an event into the RAM and when you have many events that could quickly eat up the RAM.
5) The system must have very little downtime as there are so many events coming in every second.
6) What about persistence of events? Do we need to store every event that comes in? Maybe, we would be need to do some analysis later on the events coming in.
Some quick ideas for above points,
(1) and (2) Load balancers can help here. Also, queues to hold the requests, so as to not overwhelm the system with spikes.
3) queues or lists which hold items in the order of their timestamps can do the job here
(4) , (5) and (6) We need to use distributed data structures to handle this. We could divide the huge list of events across the machines. Each machine has a list of events (ordered by timestamps). When a query is sent, every machine gives list of events in last 60 secs, which are merged and sent back as output to the user who queried. You can think of doing this stuff using frameworks like Hadoop. For persistence of events you could use HBase (single hard disk would not be able to hold all event data after a few weeks perhaps).
Obviously, this is not a complete solution and there are flaws. Hope this helps!