Yahoo Interview Question for Software Engineer / Developers


Team: Search
Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
13
of 13 vote

Assuming that they wanted a algorithmic approach,

Use: a two pass approach
First pass chunk sorting:
1. Load chunks of 4 GB data in memory and do an in place sort (Merge Sort or Quick Sort).
2. Write the chunk back to the same place on disk
3. Repeat 250 times (until end of file or all data is consumed)

Second Phases "merge":
Now you have a choice to merge, what is going to be your "Bucket size" which will govern the N in a N-way-merge. (This gets real tricky, real fast)
In this case we have 250 chunks of 4 GB each,
One way to do this is,
1. Load first 8.192 MB from each of 250 chunks (250*8.192 MB = 2GB) this is our input buffer and rest of the 4 GB is our output buffer
2. Do a 250 ways merge on this 250 sorted bins / buckets
3. When any of the 250 bins is empty load it with the next 8.192 MB from the associated chunks
4. When Output buffer of 2 GB is full - write out the buffer to the external storage.

Additional passes might be necessary to ensure all the numbers are sorted.

- ASK November 09, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

How can you be sure that the merged 2GB data won't overwrite the data that hasn't been merged yet?

For example, if the first 8.192MB from the first chunk happens to be the biggest numbers in the file. The first 2GB write to the file would overwrite and permanently loose the first 2GB of the file.

- Miro November 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@Miro: The output is written to a new file. Nothing is overwritten.

- eugene.yarovoi November 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

What are your thoughts on how this would be done if you need to sort the 1000GB file in-place ?

- riddimon February 03, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@riddimon: Heapsort would be an example of an in-place sort, but it's not very appropriate for files. The trick with manipulating files is that you want to read and write to them in a contiguous way because disk seeks are slow. Heapsort accesses data in a highly non-contiguous way as you move up and down the heap, so it's not a great choice.

Quicksort might be a good fit in this case. The main challenge of quicksort is the in-place partitioning of an array with respect to a pivot, which is something that can be done without much random access of the file. You write elements starting from the front and elements starting from the back in a contiguous way (buffer up values in memory until you have enough of them to write). Quicksort isn't really 100% in-place, requiring something like O(log N) memory, but that's a very small amount even for large N.

Quicksort does have shortcomings like degenerate cases that potentially take O(N^2) time. If you don't want to use random pivots to avoid degenerate cases, it's possible to choose the median as the pivot. There are linear-time median-selection algorithms like "median of medians", which can also be done with only logarithmic extra memory.

- eugene.yarovoi February 06, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

How you determined input buffer size as 2 GB ?

- Anonymous November 15, 2018 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

How you determined input buffer size as 2GB?

- Ravi November 15, 2018 | Flag
Comment hidden because of low score. Click to expand.
-2
of 2 vote

Any one can float example Multi threaded Sorting here

- Ashis Kumar March 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
Comment hidden because of low score. Click to expand.
0
of 2 votes

what a great value-add answer when there's already an answer on page that says external sort but actually gives all the details

- Anonymous November 09, 2012 | Flag


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More