## Adobe Interview Question for Software Analysts

Country: United States

Comment hidden because of low score. Click to expand.
2
of 2 vote

Assumptions:
1. Word here means the english word and not 8bytes.(seperated by a single space character)
2. Number of words in the file < RAND_MAX.
Here's what you can do:
1. Count the number of spaces (say x)in the file => no of. words = x + 1
2. y = rand() % (x + 1)
3. return the word after y-1 spaces

Comment hidden because of low score. Click to expand.
0

According to question, every word should have equal probability of being selected. So if a word appear more than one times then above method will increase probability of that repeated word.

Comment hidden because of low score. Click to expand.
0

i did not think of the duplicate words...in that case with no extra space i could not think of any solution :(

Comment hidden because of low score. Click to expand.
0

the question was not a random unique word. think about it. if all words needed to have equal probability per unique word, you would need to keep track of the count of the words. There simply is no other way to do it.

Comment hidden because of low score. Click to expand.
0

I've read it from algogeeks : "rand()%n is not uniformly distributed between 0 and n-1. To see this, suppose rand() produces positive 32-bit integers, so they have the range 0 to 2^31 - 1 = 2,147,483,647. Now, suppose there are 100,000 words in the file. When you are computing rand()%100000, you will get each of the numbers 0 to 83,646 on average 21,475 times out of 2^31 random numbers, but you will get the numbers 83,647 to 99,999 only 21,474 times out of every 2^31 random numbers. Thus, the first numbers in the file will appear with slightly too high a probability and the numbers at the end of the file will be selected with slightly too low a probability. "
I agree with this . So , any other solution??

Comment hidden because of low score. Click to expand.
0

Barney, I think you're focusing on the wrong issue here. rand() % n is very close to being distributed uniformly at random for small n. If you need the probability to be exactly equal, there are algorithms for selecting a number uniformly at random between 0 and X given a generator for generating numbers uniformly at random between 0 and Y. I think the point of the problem was to focus on the aspects of the problem mentioned above.

Comment hidden because of low score. Click to expand.
0

@eugene : ok .
But to find the value of n , first we've to traverse the whole file and count the number of words . Is it possible to print the random word in just 1 traverse ???

Comment hidden because of low score. Click to expand.
0

@Barney: Sure, you can use a technique known as "Reservoir Sampling". You can read about it in detail on Wikipedia.

The basic approach is that at any point you have a current random choice (which in the beginning is initialized to the first word), and as you see each additional word, you replace the current random choice with the new word you just saw with 1/k probability, if k is the number of words you've seen so far.

After reading the first word, the random choice is the first word with 100% probability
After reading the second word, make it the new random choice with 50% probability. So now the probability distribution is 50% first word, 50% second word.
After reading the third word, make it the new random choice with 1/3 probability. This means the two previous words have 1 - 1/3 = 2/3 probability of remaining selected if they were selected already, which is true with a probability of 1/2 for each word, so the probability is now (2/3)*(1/2) = 1/3 for each of the first two words as well. So every word has probability 1/3 of being chosen.

The statement that after k words, every word will have a 1/k chance of being the selected random word can be formally proven by induction. k=1 is the trivial base case and is true by construction.

For k >= 2, we opt to select the k-th word with probability 1/k, and by induction, every previous word was selected with probability 1/(k-1). Any previously selected word now has probability 1 - 1/k = (k-1)/k probability of remaining selected, and had 1/(k-1) probability of being selected in the first place, so it now has 1/k probability of being the random word. So all words now have probability 1/k of being the random word.

Comment hidden because of low score. Click to expand.
0

@eugene.yarovoi but how to ensure that the probability of a random number to be selected is 1/k??

Comment hidden because of low score. Click to expand.
0

@Karthik: Are you asking for an algorithm to give you a bit whose value is 1 with probability 1/k and 0 otherwise?

Comment hidden because of low score. Click to expand.
0

ya and how to calculate probability of a word and output it in programming?

Comment hidden because of low score. Click to expand.
0

@eugene In Reservoir Sampling, how to replace a 'random' word and what if that word is encountered afterwards during traversing?

Comment hidden because of low score. Click to expand.
0

@akashi: This algorithm isn't meant to have an equal chance of picking each distinct word, only each word. You won't be able to do it without O(n) space if you need an equal chance for each distinct word. I see no reason to assume that interpretation of the question, however.

@Karthik Vvs: what do you mean by "output it in programming?"

Comment hidden because of low score. Click to expand.
0

@Karthik: I assume we already have a random number generator that generates random numbers between 0 and some number X (such as the C rand() function, for which X is RAND_MAX).
If you want to know how, given a random number generator that generates numbers uniformly at random between 0 and some number X, we can generate numbers uniformly at random between 0 and a given number Y, see this CareerCup question I answered a while back: www .careercup. com/question?id=12426697. It contains a special case of what can be made into a more general algorithm.

I think talking about this issue is probably out-of-scope for this question though. Many programming languages already provide convenience functions to do this sort of thing, implemented similarly to how I solve the problem in the link I gave.

Comment hidden because of low score. Click to expand.
0
of 2 vote

I think just create number like m= Interger.Max and do r= random(m); and r%length should be ok

Comment hidden because of low score. Click to expand.
0

If it is possible to load entire file into memory, we can do something like this:

public void shuffleArray(String[] words) {
String temp;
int index, totalCards = words.length;
for (int i = 0; i < totalCards; i++){
index = (int) (Math.random() * (totalCards - i)) + i;
temp = words[i];
words[i] = words[index];
words[index] = temp;
System.out.print(words[i]+" ");
}
}

Comment hidden because of low score. Click to expand.
0

Andi, I think you misunderstand the question. I believe the question is to choose a single word uniformly at random from a file.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Here are few steps
1. Read the file line by line
2. using strtok, split each line using space as delimiter and collect all the words in an array (either static or dynamic array)
3. After we collect all the Words, choose a random number between Array length and get the random word.
my_words[random(Length<my_words> )]

Comment hidden because of low score. Click to expand.
0

Print a Random word from a file. Input is "path to a file",

constraints- """"No extra memory like hashing etc."""" All the words in the file should have equal probability.

if you didnt read the question properly

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.