How to add a counter to www.go

Google Interview Question

0

of 0 votes

7
Answers
How to add a counter to www.google.com to track the billionth user.
- anonymous July 08, 2011 | Report Duplicate | Flag | PURGE
Google Distributed Computing

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

The question is about scale, not simply datastructures...

Assuming all web requests were always from unique users - worst case for computing! Operationally, the web server fleet needs these processes in it:

1. Accumulatator: Add number of web requests and send to a timebound computing server. Add a random delay to report this stat so that the computing server has time to process requests over a period of time.
2. Stat listener: Fetches the current stat data from a timebound computing server

The timebound computing server (lets say 600 servers exist, each counts the request data for 6 seconds) and are roundrobined in 1 hr - it's job is to use the time it has to sort the requests timestamp wise. It also fetches data from the previous server in roundrobin when it finishes counting, and adds the current fetched data to this list. If it sees a crossover of a billion, it can very well know which transaction caused it.

This can now be sent to the web server thru' the stat listener, and it could choose to update the unique user during his/her next request.

Alternately, it can be sent to the user's data in the distributed hash table.

- Anonymous July 10, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I understand this question to be finding the billionth user to hit google servers.This can be achieved by using the technique to find the median of n numbers in a distributed architecture.

If the server logs are arranged by timestamp(which they usually are), one needs to find the median of values on each server and send back the count to a master server.The master server counts the total of these values . If it is less than a billion , it discards the values to left of the median and subtracts the total number from the billion(call this x where x=billion-total). It then tries to find the x highest number on the undiscarded set

- biswajit.86 April 23, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The central server will post a query to each server, sth like this "Send me the count of users visited between t1 and t2" where t1 and t2 is the start and end time. Assuming all the servers are synchronized to the single clock. The central server will add up all these counts and see if it is more than 1Billion. If not, it will again send the query after some time asking for the count between "t2 and t3". The time intervals can be dynamically selected by the central server based on the cumulative count. For e.g. the first query can be to send the count of users visited between 2 am and 3 am. If the total count is way less, next time the server may ask for counts between 3 am and 6am in order to reduce the no. of time it has to poke all the servers. This also means that each local server has to maintain the logs of what all users visited during what time of day. Of course log sizes are limited so we need to make sure that the central server probes at the right frequency. After a series of such request there will be a request that will return the count that when added yields 1billion or more than that. So, now we know that during time t1 and t2 the condition was met so we can use binary search approach to break this interval in a much smaller interval and compare sort the time stamp among all the servers to find out the billionth user. This need the support from local server to maintain the information of no. of users logged in between a given interval.

Things to consider:
-More servers might get added in between
- Some servers might have a downtime so how to handle such cases. May be the server where the billionth user logged on just went down.

- gauravk.18 November 23, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 vote

doubly linkedlist, when we met an overflow, we add a new node

- aileen June 02, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-4

of 4 vote

Use MinHeap so It Can be done in O(k+(n-k)logk)

- WgpShashank July 10, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What's ur problem, dude? Seems you are way too smart, and give some rubbish convoluted answer without giving much thoughts! Do you learn by this approach? I doubt....

- hmm July 10, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@hmm if i said MinHeap Then u could have searched on google what meanheap is & when its useful , i hope u r not drunk & frustrate from ur life :) anyways above idea wont work .. The idea is to have a distributed local counters and a good way to combine the local counters into one global counter. Another key property is to combine the results of local counters asynchronously in order not to slow down the search time.

Each server should keep track of the total number of users, and perform "edits" to this number.

At a specified time interval, servers should synchronize their counter by transmitted their aggregated "edits" to the total number of users.

so distributed hash table seems to be good idea , where each server has table containing user name,id as key & counter of each user as key , its just as thought ?

others approaches are also welcome ?

Shashank

- WgpShashank July 16, 2011 | Flag

CareerCup

Google Interview Question

Books

Videos

Resume Review

Mock Interviews