## Goldman Sachs Interview Question

Analysts**Team:**Strats division

**Country:**United States

**Interview Type:**Phone Interview

In case of without replacement, every draw reduces the size of the set by one. As a result, we draw from n, n-1, n-2, n-3,... i.e., the denominator of the probability becomes 1/n!.

Suppose you pick m

X be the number of attempts to pick a smaller number

X=1+Z(n) + Z(n-1)....+Z(n-m) where Z(n-m) be 1 if (n-m) is picked before picking a smaller number

E(X)=1 + (n-m)*E(Z) since all E(z) are equal

also E(Z)=1/(m-1+1)=1/m

therefore E(X)=n/m

also your ans for 2nd case is less than the 1st case which doesn't make sense as the number of no. greater than m are removed here. so picks will be reduced

the number you pick:m, is still a random variable. what you gave: n / (m - 1) = E(y | m), where y is the number of times you need to pick in order to find a number smaller than m. Ultimately, we need to find E(y) = E(E(y|m)) = E(n/(m-1)). Since m is ranging from 2 to n, with prob 1/n, we can get E(n/(m - 1)) = sum(m =2 to n) {n/(m-1) * 1/n} = 1 + 1/2 + 1/3 + ... + 1/(n-1), which is a partial sum of harmonic series

for the second case, it gets messier.... hard to find a close form .

can't believe this is a phone interview.

I have solved this problem partially but I am unable to continue at this point:

case 1:

There are 2 approaches: logical and recursive.

if you pick a number i, to find a number smaller than the one you have is:

Exp(num<i)=1*(i-1)/(n-1)+2*(n-i)(i-1)/(n-1)^2+3*(n-i)^2(i-1)/(n-1)^3.......

which reduces to Exp(num<i)=(n-1)/(n-i)

Also, recursively

E(i)=1*(i-1)/(n-1)+(E(i)+1)*(n-i)/(n-1)

when you solve it, u get

E(i)=(n-1)/(n-i)

Now, this is where i get stuck.

to find final expectation value

E=sum(E(i)) where i=1:n =infinite :/

and if u exclude the case of picking up 1, u get a harmonic sum. so, what do i do next?

case 2:

this doesn't lead to infinite but it poses a logical question: you know that if u pick the number 1, even if we look through the remaining n-1 chits, there is nothing u can find smaller than 1. so it's a failure. meaning u never find it, not... u find it after n-1, so we can't assume it as any number....

so what do i do for that??

Continuing, if i exclude 1 and try to solve logically, cause the recursion formula i couldn't get

if u pick

n -> 1

n-1 -> 1*(n-2)/(n-1)+2*1/n-1

n-2 -> 1*(n-3)/(n-1)+2*2/(n-1)(n-2)

and so on

2 -> 1*1/(n-1)+2*(n-2)*1/(n-1)(n-2)+3*(n-2)(n-3)*1/(n-1)(n-2)(n-3)+.......

how do i simply it?

It's 3.

The expectation is

1*(K-1)/(n-1) + 2*(n-K)/(n-1) *(K-1)/(n-1) + 3 * ((n-K)/(n-1))^2 * (K-1)/(n-1) +4*......

Because K is randomly chosen, so K's expectation is n/2;

then (n-K) and(K-1) is roughly n/2 as well

The above equation becomes function f:

1*(1/2)+2*(1/2)^2+3*(1/2)^3+...

we can solve it by assuming function f = x, then 2x = 1*1+2*(1/2)+3*(1/2)^2+...

2x-x = x = 1+1*(1/2)+1*(1/2)^2+1*(1/2)^3+.....= 1+2 = 3

x=3

Nice solution. Your answer is almost correct. It is not trivial to find the exact value of expectation esp. for the case with replacement.

There is a minor flaw (in my opinion) in your argument. Mathematically speaking, the problem involves choosing a random number "m" and then finding the number of picks which follows until we obtain a number less than "m", e.g., "k" picks. For each "m", we have a specific distribution for "k". Then, the answer would be to calculate E[k|m] for fixed "m" with respect to "k", and then find the expected value of the result with respect to "m".

You are instead finding E[k|E[m]] which is different from E[E[k|m]].

Another perspective is that the function you have above is nonlinear in "K" that is another reason why you shouldn't put K / 2 and solve the problem.

Short answer: 2 + O(1 / n).

Reason: Note that picking a larger number is as probable as picking a smaller number. For large n, picking the same number has a chance of 1 / n. So ignore it. Then, at each pick you might pick a larger number with probability 1 / 2 and a smaller number with probability 1 / 2. Therefore, the average number of picks in both cases is almost 2.

This approximation is shaky if n is not large, e.g., n = 2, 3, 4.

NOTE: Also, there is a technical problem with current question.

If you are looking for a number less than the initial pick you should note that if the first pick is "1", there is no way to find a smaller number. Then, the average number of picks will be infinity.

My first thought:

Let X be random variable counting the number of picks until we get success (that is, we get success on Xth pick).

Case 1) With replacement.

--------------------------------------

Probability of success on any trial is always (k-1)/n (since k-1 nums < k)

Let p= ( that junk )

So PMF of X is:

f(x) = (probability of x-1 failures)*(probability of success on xth trial)

= (1-p)^(x-1) * p

So,

E[X] = SUM_to_inf {x*f(x) } = 1/p (use derivative of geom. series to prove)

so E[X] = n/(k-1)

Case 2) Without replacement.

----------------------------------------

PMF of X is:

f(x) = (probability of x-1 failed picks)*(probability of a success pick after)

= (a)*(b) <-- defining a and b for convenience

a = (number of ways to choose "x-1" nums out of total "n-k+1" bad nums)

---------------------------------------------------------------------------------------------------

(number of ways to choose "x-1" nums out of the full n nums

a = (n-k+1)C(x-1) <--- binomial coefficient

-------------------

(n)C(x-1)

Now for b:

x-1 nums have been removed from n choices, so we have n-(x-1) = n-x+1 leftover nums to pick from

b = k / (n-x+1) <--- probability of picking a good num from the leftovers

so PMF looks like:

f(x) = (n-k+1)C(x-1) k

------------------- * ------------------

(n)C(x-1) (n-x+1)

but this is not infinite random variable as

x is limited to 1, 2, ..., (n-k+1)

So we are done,

E[X] = SUM_{x=1 to (n-k+1)} f(x) <-- plug junk in for f(x)

I am not obligated to simplify this further for Goldman Sachs because it is a finite sum.

I didn't check my work above (cross my fingers).

My first thought:

Let X be random variable counting the number of picks until we get success (that is, we get success on Xth pick).

Case 1) With replacement.

--------------------------------------

Probability of success on any trial is always (k-1)/n (since k-1 nums < k)

Let p= ( that junk )

So PMF of X is:

f(x) = (probability of x-1 failures)*(probability of success on xth trial)

= (1-p)^(x-1) * p

So,

E[X] = SUM_to_inf {x*f(x) } = 1/p (use derivative of geom. series to prove)

so E[X] = n/(k-1)

Case 2) Without replacement.

----------------------------------------

PMF of X is:

f(x) = (probability of x-1 failed picks)*(probability of a success pick after)

= (a)*(b) <-- defining a and b for convenience

a = (number of ways to choose "x-1" nums out of total "n-k+1" bad nums)

---------------------------------------------------------------------------------------------------

(number of ways to choose "x-1" nums out of the full n nums

```
a = (n-k+1)C(x-1) <--- binomial coefficient
-------------------
(n)C(x-1)
```

Now for b:

x-1 nums have been removed from n choices, so we have n-(x-1) = n-x+1 leftover nums to pick from

b = k / (n-x+1) <--- probability of picking a good num from the leftovers

so PMF looks like:

```
f(x) = (n-k+1)C(x-1) k
------------------- * ------------------
(n)C(x-1) (n-x+1)
```

but this is not infinite random variable as

x is limited to 1, 2, ..., (n-k+1)

So we are done,

E[X] = SUM_{x=1 to (n-k+1)} f(x) <-- plug junk in for f(x)

I am not obligated to simplify this further for Goldman Sachs because it is a finite sum.

I didn't check my work above (cross my fingers).

Uggh... hate the formatting issues.. repeat:

```
P(first x-1 choices were all bad choices)=
a =
(n-k+1)C(x-1) <--- binomial coefficient
-------------------
(n)C(x-1)
```

Now for b:

x-1 nums have been removed from n choices, so we have n-(x-1) = n-x+1 leftover nums to pick from

b = k / (n-x+1) <--- probability of picking a good num from the leftovers

so PMF looks like:

```
P(X=x) = f(x) =
(n-k+1)C(x-1) k
------------------- * --------------
(n)C(x-1) (n-x+1)
```

```
m: number of times
replace:
p = (k - 1) / n;
Pr(number of times = m) = (1 - p)^(m - 1) * p
E(m) = sum((1 - p)^(m - 1) * p * m), m = 1,2,...
no replace:
number of times = m
1 <= m <= n - k + 2
Pr(number of times = m) = (n - k + 1) / n * (n - k) / (n - 1) * ... (n - k - m + 3) / (n - m + 2) * (k - 1) / (n - m + 1)
E(m) = sum(Pr(number of times = m) * m), m = 1, 2, .. n - k + 2
```

With replacement:

We need to find E(X) where X is event where I pick a number m <=n and draw a number from Hat k then k < m.

Let's p = P(I pick a number m <=n and draw a number from Hat k then k < m ).

E(X) = p*(1 + 0{As we have already reached success in the first step})+ (1-p){probability of failuer} (1 + E(X) {As we need to repeat the step again} )

i.e. E(X) = 1/p .

Now, we need to find the value of p where p = P(I pick a number m <=n and draw a number from Hat k then k < m ).

let's say A is event of picking a random number m <= n

And, B is a event of drawing a number k from Hat which is lesser than m.

We can calculate p by adding all possibilities of success, i.e. picking 1, and drawing < 1, or picking 2 and drawing < 2 or picking 3 and drawing < 3 ......... or picking n and drawing < n

P( Picking a number m from 1 to n ) = 1/n

P( drawing a number from hat < m) = m-1/n

Therefore,

p = 1/n*( (1-1)/n) + 1/n*((2-1)/n) + 1/n*((3-1)/n) + ...... + 1/n*((n-1)/n)

p =1/n^2 * ( 0 + 1 + 2 + 3 ......... + n-1) = n(n-1)/2n^2 = (n-1)/2n

And hence E(X) = 1/p = 2n/(n-1)

this should be correct for the version with replacement, validated with simulation in python

```
import numpy as np
import time
def simulate(n, x):
arr = list(range(1,n+1))
count = 0
while True:
draw_idx = np.random.randint(low=0, high=len(arr)) # exclusive end
draw = arr[draw_idx]
arr = arr[:draw_idx] + arr[draw_idx+1:]
count += 1
if draw < x:
return count
def test(num_trials, n, x):
results = []
t0 = time.time()
for i in range(num_trials):
if i % 10000 == 0:
print("%d k, time elasped %.2f m" % ( i//1000, (time.time()-t0)/60 ))
results.append(simulate(n,x))
return np.mean(results)
def get_prob(n, x, num_toss):
cumprod = 1.0
for term_i in range(1, num_toss+1):
if term_i == 1:
term = (x-1) * 1.0 / (n-(num_toss-1))
else:
term = (n-(x-1+num_toss-term_i)) * 1.0 / (n-(num_toss-1)+(term_i-1) )
if term <= 0:
return 0
#print (term_i, term)
cumprod *= term
return cumprod
def cal(n, x):
expectation = 0
for num_toss in range(1,n):
p = get_prob(n,x, num_toss)
expectation += p * num_toss
return expectation
num_trials = 8*10**5
n = 11
x = 8
if True:
print (test(num_trials, n, x))
print (cal(n,x))
# both give 1.5
```

If getting a number smaller than the number that is picked first is considered a success, and otherwise a failure, then these events become Bernoulli trials.

- Murali Mohan August 23, 2013The question asks:

>> Find the expectation value of *number of times* you need to pick numbers...

The answer to the question: "How many trials before a success" is provided by geometric distribution, whose mean, E[X] = 1/p, where p is the probability of success.

Case 1: If you replace

Let m be the number picked, then probability of choosing a smaller number is m-1/n. The expected value is therefore 1/(m-1)/n = n/(m-1)

Case 2: if you don't replace

Let m be the number picked, then probability of choosing a smaller number is m-1/n-1. The expected value is therefore 1/(m-1)/(n-1) = (n-1)/(m-1)