Amazon Interview Question
SDE-2sCountry: India
Interview Type: Phone Interview
External Sorting
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted sub-files are combined into a single larger file.
This is very easy if you know N-way merge.
Following are the steps:
1) Make 10 chunks of the file, of 1 GB each.
2) Sort each of them individually using Insertion sort because insertion sort works the best when the value of n is small but it might not be the case in this one. In that case, use merge sort which is O(nlogn) in worst case. Tip: I would ask the interviewer to tell me if I have auxiliary space of O(n) because merge sort requires that.
3) After these chunks are sorted, take 1GB/(10+1) of data from each chunk into the memory. 10+1 is in the denominator because we want to have some space to store the temporary output.
4) Start a 10-way merge algorithm.
5) As soon as the output buffer is full, write it to the file in the disk.
6) When there is a vacancy in the input buffer, fetch the next chunk.
you mean split, sort and merge (concatenation wouldn't necessarily produce a sorted result)
- Chris June 06, 2017first pass: read 10 * 1GB sort in memory, write to disk (10 files)
second pass: open the 10 1 GB files (as streams), create a new file and merge the content of the other files into the one target file (pick the smallest record on top of each stream and write it to the target) until done. then delete the 10 intermediary files.
first pass: O(n*lg(n)) second pass, O(n): total O(n*lg(n)) but due to disc activity, a bit slow