Google Interview Question
Software Engineer / Developersint i=0;
int k = 1 Million;
String[] hinQ = new String[K];
while(reader has not reached eof)
{
String querry = reader.NextLine();
if(getLanuage(querry) == 'Hindi')
{
i++;
if(i < K)hinQ[i]=querry;
else if(random(i)<K)
{
hinQ[random(K)]= querry;
}
}
}
// Also note that if memory space is not enough to hold 1 million querry, you can copy the same file to different location and limit your K and i (where to start and where to end except the last)
Using Reservoir Sampling, you could basically read from the stream of search queries and add them with probablity = k/k+i, where k is 1 million and i denotes the ith hindi query after the kth hindi query. You are ignoring the non-hindi queries as you read from the stream. When you have read all the queries the k queries you have are randomly selected hindi queries.
- Rajesh Konda June 29, 2009