Google Interview Question for Software Engineers


Country: United States




Comment hidden because of low score. Click to expand.
3
of 3 vote

Looks like MapReduce problem to me. Fork multiple Map jobs each takes a part of file and calculate unique ids. Output of all Map jobs are aggregate by a final reducer. If file is still bigger we can have sequence of Map-Reduce jobs.

- Richeek February 24, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

I think this is the answer was along the lines my interviewer was looking for

- LeetJile February 24, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

for this the ids will have to be sorted using external sorting. Wikipedia "external sorting" if need further details.
After sorting we will have to load memory sized chunks of the sorted file. We can easily track unique ids by checking adjacent ids in each chunk, and write them off to a file. Also, we will have to store the last unique id in a chunk and not write it to the unique id file, to handle the case of duplicate id being split on the edge of chunk.

- Adi February 23, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Depending on the range of unique IDs, one could potentially use the simple approach of a hash to track unique IDs. Probably one of the first questions to ask the interviewer even though they'll probably say it's an unspecified large range.

- JW February 23, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Adding to JW's response, another question would be how big a unique id can be, are they all fixed length or variable length?

- Sam February 23, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

add all the ids in a hashset as keys and return them

- aaa February 23, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

We can use Hashing for UID and if these unique id is in large file then we can use hasing as well as file partioning according to avilable main memory..
please correct me if i a wrong

- sumit.polite February 23, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

If possible, I'd just do the appropriate grep/cut/awk and pipe it through sort and uniq. I'd probably mention that in the interview because it takes less than a minute to explain. Obviously it's not the optimal approach, but it's the most practical one in many real world situations. It completely breaks down for huge files, not just because it must all be held in memory but because of the sorting.

Otherwise I'd take a MapReduce approach. For a chunk of the file, use a hash map to find unique IDs; there's no need to sort here. In fact, if 100% accuracy is not required, and one only needs to query whether or not a given ID is in the file, then one could just construct a bloom filter.

Either way, aggregate the hash tables together by merging their keysets.

- nilkn February 27, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think a better (and easier) solution is read the file line by line and store the IDs in a Trie.

- Anonymous March 17, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I was thinking along the same line. For a large set of data, the distributed hash will work. first hash (or first k bits of the UID) to find the hash table within the hash table set. then run the hash on the UID to locate the entry(update by incrementing the counter or insert). Finally, each set can be individually queried for unique id. Hash and then the query can be run across multiple threads, where each thread reads from a section of the file at some offset, over individual hash tables.

- ali.kheam May 03, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Mapreduce will do too, though I am little familar with this paradigm.
the map to take the block file data, and output UID:1 for each UID seen.
Prior to sharding/shuffling the id's (keys) to a reducer, the filtering can be run on the output of each map to filter out the non unique ids.
The reducer will aggregate the unique id, and for any ID, if count ==1, emit(unique id, 1) else consume

- ali.kheam May 03, 2017 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More