Print a Random word from a fil

Adobe Interview Question for Software Analysts

1

of 1 vote

19
Answers
Print a Random word from a file. Input is "path to a file",

constraints- No extra memory like hashing etc. All the words in the file should have equal probability.
- Shobhit November 15, 2012 in United States | Report Duplicate | Flag | PURGE
Adobe Software Analyst Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

Assumptions:
1. Word here means the english word and not 8bytes.(seperated by a single space character)
2. Number of words in the file < RAND_MAX.
Here's what you can do:
1. Count the number of spaces (say x)in the file => no of. words = x + 1
2. y = rand() % (x + 1)
3. return the word after y-1 spaces

- The Artist November 15, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

According to question, every word should have equal probability of being selected. So if a word appear more than one times then above method will increase probability of that repeated word.

- akashj November 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

i did not think of the duplicate words...in that case with no extra space i could not think of any solution :(

- The Artist November 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

the question was not a random unique word. think about it. if all words needed to have equal probability per unique word, you would need to keep track of the count of the words. There simply is no other way to do it.

- charley murphey November 16, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I've read it from algogeeks : "rand()%n is not uniformly distributed between 0 and n-1. To see this, suppose rand() produces positive 32-bit integers, so they have the range 0 to 2^31 - 1 = 2,147,483,647. Now, suppose there are 100,000 words in the file. When you are computing rand()%100000, you will get each of the numbers 0 to 83,646 on average 21,475 times out of 2^31 random numbers, but you will get the numbers 83,647 to 99,999 only 21,474 times out of every 2^31 random numbers. Thus, the first numbers in the file will appear with slightly too high a probability and the numbers at the end of the file will be selected with slightly too low a probability. "
I agree with this . So , any other solution??

- Barney November 16, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Barney, I think you're focusing on the wrong issue here. rand() % n is very close to being distributed uniformly at random for small n. If you need the probability to be exactly equal, there are algorithms for selecting a number uniformly at random between 0 and X given a generator for generating numbers uniformly at random between 0 and Y. I think the point of the problem was to focus on the aspects of the problem mentioned above.

- eugene.yarovoi November 16, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@eugene : ok .
But to find the value of n , first we've to traverse the whole file and count the number of words . Is it possible to print the random word in just 1 traverse ???

- Barney November 16, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Barney: Sure, you can use a technique known as "Reservoir Sampling". You can read about it in detail on Wikipedia.

The basic approach is that at any point you have a current random choice (which in the beginning is initialized to the first word), and as you see each additional word, you replace the current random choice with the new word you just saw with 1/k probability, if k is the number of words you've seen so far.

After reading the first word, the random choice is the first word with 100% probability
After reading the second word, make it the new random choice with 50% probability. So now the probability distribution is 50% first word, 50% second word.
After reading the third word, make it the new random choice with 1/3 probability. This means the two previous words have 1 - 1/3 = 2/3 probability of remaining selected if they were selected already, which is true with a probability of 1/2 for each word, so the probability is now (2/3)*(1/2) = 1/3 for each of the first two words as well. So every word has probability 1/3 of being chosen.

The statement that after k words, every word will have a 1/k chance of being the selected random word can be formally proven by induction. k=1 is the trivial base case and is true by construction.

For k >= 2, we opt to select the k-th word with probability 1/k, and by induction, every previous word was selected with probability 1/(k-1). Any previously selected word now has probability 1 - 1/k = (k-1)/k probability of remaining selected, and had 1/(k-1) probability of being selected in the first place, so it now has 1/k probability of being the random word. So all words now have probability 1/k of being the random word.

- eugene.yarovoi November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@eugene.yarovoi but how to ensure that the probability of a random number to be selected is 1/k??

- Karthik Vvs November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Karthik: Are you asking for an algorithm to give you a bit whose value is 1 with probability 1/k and 0 otherwise?

- eugene.yarovoi November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

ya and how to calculate probability of a word and output it in programming?

- Karthik Vvs November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@eugene In Reservoir Sampling, how to replace a 'random' word and what if that word is encountered afterwards during traversing?

- akashj November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@akashi: This algorithm isn't meant to have an equal chance of picking each distinct word, only each word. You won't be able to do it without O(n) space if you need an equal chance for each distinct word. I see no reason to assume that interpretation of the question, however.

@Karthik Vvs: what do you mean by "output it in programming?"

- eugene.yarovoi November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Karthik: I assume we already have a random number generator that generates random numbers between 0 and some number X (such as the C rand() function, for which X is RAND_MAX).
If you want to know how, given a random number generator that generates numbers uniformly at random between 0 and some number X, we can generate numbers uniformly at random between 0 and a given number Y, see this CareerCup question I answered a while back: www .careercup. com/question?id=12426697. It contains a special case of what can be made into a more general algorithm.

I think talking about this issue is probably out-of-scope for this question though. Many programming languages already provide convenience functions to do this sort of thing, implemented similarly to how I solve the problem in the link I gave.

- eugene.yarovoi November 18, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

I think just create number like m= Interger.Max and do r= random(m); and r%length should be ok

- Yahoo November 16, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

If it is possible to load entire file into memory, we can do something like this:

public void shuffleArray(String[] words) {
String temp;
int index, totalCards = words.length;
for (int i = 0; i < totalCards; i++){
index = (int) (Math.random() * (totalCards - i)) + i;
temp = words[i];
words[i] = words[index];
words[index] = temp;
System.out.print(words[i]+" ");
}
}

- Andi November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Andi, I think you misunderstand the question. I believe the question is to choose a single word uniformly at random from a file.

- eugene.yarovoi November 17, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Here are few steps
1. Read the file line by line
2. using strtok, split each line using space as delimiter and collect all the words in an array (either static or dynamic array)
3. After we collect all the Words, choose a random number between Array length and get the random word.
my_words[random(Length<my_words> )]

- unknownMind November 19, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Print a Random word from a file. Input is "path to a file",

constraints- """"No extra memory like hashing etc."""" All the words in the file should have equal probability.

if you didnt read the question properly

- Anonymous November 25, 2012 | Flag

CareerCup

Adobe Interview Question for Software Analysts

Books

Videos

Resume Review

Mock Interviews