Yahoo Interview Question for Systems Design Engineers


Country: India
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
6
of 6 vote

Assuming that File-1 and File-2 have no duplicates within themselves? And memory limitation is not a issue:
1. Iterate through File-2 adding each string to a HashMap and writing each string to File-3
2. Iterate through File-1, check if each string is present in the HashMap, if its not then write the string to File-3.
If memory is an issue, you could use a memory-mapped-file to store the HashMap strings.

- CameronWills November 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

This problem is related to the time and space trade off.

1. Space/Memory is the issue, time is not a prime concern.

Steps:
a. Sort File -1 and File-2 using External sort.
b. Now merge the sorted file File 3.

Almost no extra space required. But it requires O(nlgn+mlgm) time.

2. If the time is the issue and memory is not factor at all.

Steps:
a. Create hash map that map keys to string.
b. Start with either File-1 or File-2. Pick a string and generate a hash key.
c. Store the string in the hash map with the key.
d. Write the string in the file.
e. Pick another string do the same.
f. If there is collision in the hash table, then compare the current string the existing string in the hash table. If they are same, then discard the current string.
g. If the current string and the string in the table is different, then add this current string using chaining method to the hash table. Write down the current string into the File -3.


Complexity:

Time complexity : O(n+m)

Space Complexity: If we take a hash table that can hold O(m) number of keys. Then, if there are lots of duplicates, then the size would be O(m). In worst case, if there is no duplicates, the the total entry in the hash table is O(n+m)

- Towhid February 16, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

if not concerned about space add the strings to a set which will eliminate duplicates and as well they will be merged.

- swathi November 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

it is just rewrital of the question, you should be designing the algorithm not using predefined functions specified to language.

- Mustafa November 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Memory is the issue. Assume md5sum of each string is 4 bytes and no duplicate string, the set would be at least 6,000,000 x 4 bytes.

- Nguyen November 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

6,000,000 x 4 bytes = 22.8 Megabyte <- Not an issue for Memory?

- CameronWills November 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

these days, computer has much memory more than 4~8GB just 6,000,000 strings whose size is 1024. it is just 6,000,000 * 1024 = 6GB, just read at once, and make tree and merge -_-;

- charsyam November 09, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

creating a suffix tree or trie out of file 1 and then traversing through file 2 and comparing with trie created

- gaurav November 09, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Assign numerical values(hash values) to each string using 26-radix. For example, cat = 3*(26 pow 2)+1*(26 pow 1)+20*(26 pow 0)
That leaves us with strings which have different numerical value or same numerical value but are either exactly same strings OR different.To compare two strings, first check if their numerical value is same. Only if its same, match letter by letter.
Now sort smaller file and for each word in larger file, do a binary search in smaller file(height of smaller file is lesser and sorting it also takes lesser time).

- kuldeep.hbti January 29, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

1 calculate hash-code of each string in the file-1 and use it as key to check whether there already a element in it, if so, check the file-3 whether the string has been saved in it.
2 If the string has not been saved in file-3, save it, otherwise,do nothing.
3 repeat step 1 and 2 for file-2.

- wennan he March 09, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More