Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
simulate an array shuffle, with a virtual array(only save the length, and the 1st element).
When we scan to index n, the 1st element of the virtual array, would be any number in x0...xn with probability 1/n
the shuffle function works like,
when scan to xi, there are i+1 choices to insert it to the virtual array. randomInt(i+1)
if random number is 0, replace the 1st element.
else do nothing.
The reference to "an infinite stream of ..." implies that we cannot simply consider a uniform distribution as in the case of discrete random variables. In other words, we cannot assume 1/n as the probability assigned to each element. One thing we can do, however, is to check if the lower and the upper bound of these numbers are available. Then, we can use something like:
(b-a)rnd() + a
to find the next probability. Such a boundaries can be found on-the-fly i.e., as the numbers are arriving we can update the previous bounds, given the recently received value.
Divide the streams into partitions and each partition contains set of numbers.
Now, genRand() to the pick the partition and once the partition is picked, then again do genRand() now to pick the number.
Now, the prob (number) = 1/(n^2) .
We can even extend to more complex (distributed).
1. genRand() to pick a node
2. Each node has many partitions and now genRand() to pick a partition within a node
3. Now, once the partition is pick , genRand() to pick the number,
In this, prob(num) = 1/(SPN) where S = # of nodes , P = # of partitions with each node (assume same for all nodes) and N = # of elements in each partitions.
Divide the streams into partitions and each partition contains set of numbers.
Now, genRand() to the pick the partition and once the partition is picked, then again do genRand() now to pick the number.
Now, the prob (number) = 1/(n^2) .
We can even extend to more complex (distributed).
1. genRand() to pick a node
2. Each node has many partitions and now genRand() to pick a partition within a node
3. Now, once the partition is pick , genRand() to pick the number,
In this, prob(num) = 1/(SPN) where S = # of nodes , P = # of partitions with each node (assume same for all nodes) and N = # of elements in each partitions.
public int GetRandom()
{
var list = new List<int> { 5, 8, 9, 10, 12, 15, 20 };
var currentLocation = 0;
var randomNo = list[0];
//We don't count the list so it is as good an infinite stream
foreach (var item in list)
{
currentLocation += 1;
if (currentLocation == 1) continue;
//Return a no between 1 and the last item in the list. The probability for the new no to be picked get's less and less
//as we go through the list and all no has equal chance.
int newRandomLocation = new Random().Next(currentLocation) + 1;
randomNo = newRandomLocation == currentLocation ? item : randomNo;
}
return randomNo;
}
This is reservoir sampling where k == 0
int singleReservoirSampling(std::vector<int> &arr) {
int result = arr[0]; // use single var instead of array as k == 0
for(int i = 1; i < arr.size(); ++i) {
int r = std::rand() % i;
if(r == 0) { // replacing r <= k with r == 0
result = arr[i]; // probability to be set is 1 to i
}
}
return result;
}
static int stream_sample(int* numbers)
{
int k = 0;
int selected = 0;
while (*numbers != 0)
{
++k;
if (rand() * k < 1)
{
selected = *numbers;
}
numbers++;
}
return selected;
}
static int stream_sample(int* numbers)
{
double cur = 1;
double selected = 0;
while (*numbers != 0)
{
double new_cur = rand();
if (new_cur < cur)
{
cur = new_cur;
selected = *numbers;
}
numbers++;
}
return selected;
}
Analysis :
In the ith iteration we decide whether ith element should be the random element.
P(candidate = i) : 1/i
Probability that the ith element so chosen is the random element :
P(candidate=i)*P(it is never replaced)
In ith iteration : P(random element which is replaced) : 1/i
P(it is not replaced) : 1 - 1/i
P(pth element is random) :
P(chosen p) * P(never replaced random in p+1th iteration to final iteration )
= 1/p * Product{i = p+1 to n} (1-i)/i = 1/n .
Check Reservoir Sampling on wiki please. In this case k = 1.
- joe kidd September 26, 2013