iGottaBeKidding
BAN USERThere are N number of machines,
Determine top 10 in every machine.
Each machine transmits its top 10 to every other machine i.e (n-1)10 urls
This Implies each machine also recieves (n-1)10 urls
Process these and determine the top 10 in every machine.
Each machine now sends its Top 10 to a single place,(n*10)
The top 10 among these (n*10) will be the most visited URL's
I do not know if there is any way to avoid not replicating the data on each machine. But will N*(N-1)*10 url's be too much traffic for the network to handle, cos that will be the total number of replications required.
Other possible solutions I could think of was using P-S pattern to publish count of URL's periodically.
I think DFS and Top sort has to be modified (Pardon me if you have, i haven't seen your code yet).
Also, top sort can have several different output's too. Here it seems like we are supposed to be gettinga one particular result.
Ex:
<R,A>,<A,B><A,C><B,D><D,A>
The required answer would be
R
\tA
\tD
\t\tB
\t\tC
While DFS or Top Sort would give
R
\tA
\t\tB
\t\tC
\t\t\tD
There could be a few follow up questions to be asked for the interviewer here. like will cyclic graphs be allowed
<A,B><B,C><C,A>
whats the expected output for this?
My bad, had not read any answers, before replying.
- iGottaBeKidding September 28, 2013