Google Interview Question for Software Engineers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
0
of 0 vote

That's a systems design question, not coding, I assume.

- I define the thing I want the rolling statistics as fact (e.g. user id, hashtag, ...)
- Then I would split the time window into slots such a way that the timewindow / slots provides a reasonable update rate for the statistics while it keeps the number of slots small enough for a feasible solution.
- Now I have two things: a queue which represents a time slot. Each queue items holds a hash table which counts for the seen facts the occurrences (frequency)
- Further I keep a hashtable that will count the occurrences total, it will have the fact as key and a refrence to a tree-item as value. The tree-item has the frequency as key and the fact as value.
- If an item arrives, I will update the head of the queue or create a new queue item if a new slot is touched by the timestamp of the arriving item. I will as well increment the frequency of that item in my hash-table->tree-item structure
- I will check as well the oldest items in the queue and if they are older than the new item arrived, I will subtract the key-value pairs from my main statistic and throw away that queue item away

Now I can always query my tree for the top-k keys and receive the the top-k items.

This design has a few assumptions and weak points:
- it is optimized for a high query rate on top most k items.
- it is slightly wrong, because I add items as they arrive to the hash-table -> tree-item structure, but remove them with a lower granularity of time per slot
- I only update the statistics if items arrive, assuming, items arrive always and if not, the statistics relatives don't change (which is true, but maybe top k should be an empty list if no items arrived longer than the moving window)

For high availability and fault tolerance of single servers:
- a number of 'frequency aggregatos' that receive the events and assign them to time slots (either synchronized time or fine grained ... the whole time discussion could fill a couple of minutes here). Here I could place a load balancer in front or if I control the clients assign different clients a different aggregator and fallback if primary for that client fails, etc..
- multiple 'redundant statistics runners' that will receive data from aggregator and sum up to final statistics as pointed out in prior section. The aggregator will translate thousand or hundereds of thouasands of requests to just one per time slot, which will be a list of key-value pairs it will forward to the main server. It might already cut away the long tail in order to minimize traffic... but that's another discussion because it's not that easy, one might need to consider re-elevating long-tail's that peak up or if very small time slots are considered, if a fact occurs exactly once in a time slot, it might need re-elevation as well otherwise statistics might be wrong.

Now, I can query any one of the redundant statistics runner. How ever, I need to account for statistics runner that are temporarily overloaded or didn't receive all updates from the aggregators, because they will return a different and wrong statistics.

- Chris December 09, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

It's actually a sliding window function of streaming processing system like Flink, Storm, etc.
To store the data, use a fifo list to store every element with timestamp. and each time a new element comes to the tail of the list, evict the expired elements using timestamp and update the sum.
use map reduce like function to shard messages to multiple machines, and sort them perhaps or keep a min-heap of top K.

- zyfo2 December 09, 2017 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More