Huawei Interview Question
Software Engineer / Developers1. Read 2GB (= 2 file) of the data in main memory and sort by some conventional method, like quicksort.
2. Write the sorted data to disk.
3. Repeat steps 1 and 2 until all of the data is in sorted 2 GB chunks (there are 10 GB / 2 GB = 5 chunks), which now need to be merged into one single output file.
4. Read the first 1 GB of each sorted chunk into input buffers in main memory and allocate the remaining 5 GB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
5. Perform a merge and store the result in the output file. If the output buffer is full, write it to the final sorted file, and empty it.
6. If any of the 5 input buffers gets empty, fill it with the next 1 GB of its associated 2 GB sorted chunk until no more data from the chunk is available.
This is the key step that makes external merge sort work externally -- because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed.
In step(4), how can u bring 1GB from each of the sorted chunks(in ur case 5) into main memory(which is 2GB). I guess you will have to divide the main memory into no. of chunk groups i.e. 5 (which comes to 2Gb/5 = 400MB each). Now from each of the sorted chunk get 400MB of data into the main memory, merge it and put back in output buffer. Now as the 400MB chunks get empty, get the next 400MB chunk from its sorted chunk and repeart the process. Correct me if i m wrong.
external merge sort
- Anonymous June 14, 2011