If you have 1TB of unsorted lo

Microsoft Interview Question for Software Engineers

1

of 1 vote

15
Answers
If you have 1TB of unsorted long integers but 1GB of memory, devise an algorithm to efficiently sort the integers. What is the time complexity? What is the space complexity?
- popeye123 January 29, 2016 in United States | Report Duplicate | Flag | PURGE
Microsoft Software Engineer

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

First split the 1 TB data into 1024 blocks and then sort these files/data individually.
Then open all files (1024 of them. just have a pointer to the beginning of the file, not load the entire file). read the first entry from all the files and then put them into a min heap. So this way, we get 1st 1000 integers which are sorted globally. Next again load the heap with next 1000 entries from the file, continue this until all the entries in the file are finished.

- rangacoder February 02, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

i think this won't work.
say i have a sorted already 1T data
and according to your way, the first 1024 been sorted out of the first batch is going to ruin the originally sorted 1T list

- bill February 18, 2016 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

external MergeSort

- zr.roman January 29, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

You just store the difference between the long integers, that way you can save up memory.

You can use bucket sort to sort the array/buffer where you stored the difference then since you know the maximum and the minimum number will be a long integer so the constraints are met and your time complexity would be O(N).

- Anonymous January 29, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I really liked this one. But this one has one issue.

Difference of 2 longs can be a long as well.

Long.MAX_VALUE - 0 = Long.Max_VALUE

- RangaCoder February 02, 2016 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

First we could check with the interviewer whether the numbers are in any range. For example if the numbers represent human range. if yes ,
then we need to store numbers between 1 - 130 and counts of each one of them.

Its just about asking right questions.

If the numbers are really distributed and no correlation
we can bring in numbers in memory and use merge sort
That we we bring data 1000 + 500 + 250 + 125 + 63 + .... times

- Ananymouse February 02, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Assumptions: Size of integer is 4 bytes

Range of integers is -2Billion (Integer.minValue) to 2Billion (Integer.maxValue)

Space required to hold 4Billion numbers = 4B * 4bytes = 16B bytes = 16GB

But, it is given that we have 1TB of data which are numbers. Obviously, there are a lot of duplicates. We will need to find out, which of them is repeated how many times.
We can create a hash table of all numbers in the integer range (from -2Billion to 2Billion) as key and the count of their occurrence as value. Count is unsigned int.
Size of 1 entry of hash table (4 Bytes for number and 4 Bytes for count) = 8 Bytes
Total number of entries in the hash table = 4Billion
Total size of hash table = 4Billion * 8 Bytes = 32 Billion bytes = 32 GB

Available RAM is1GB= We will need 32 different hash tables to hold 32GB of data. Alternatively, we can use 64 hash tables, each with 500MB size.

Here is the algorithm:
1. Read 500MB of data from the 1TB file
2. Read numbers sequentially from step 1
3. Pick the hash table (1 to 64) depending on where the number read in step 2 fits. Add the entry and increment the count against the number. Save hashtable to disk if you have to pick another hashtable.
4. Repeat steps 2-3 until you exhaust the number
5. Goto step 1 to read the next 500MB worth of data from disk

Complexity of this algorithm is O(n)

Limitation: This will not work of any number(s) is repeated more than 4B times (range of int)

- RD February 02, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The question is about long integers which is 8 bytes long

- Anonymous February 03, 2016 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This requires a lot of copying from memory to disk picking out the right hash table for each number! So even though its O(n), it may be highly inefficient.

- fernando February 06, 2016 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This will require a lot of read/write from memory to disk, in order to pick out the appropriate hash table for each number. So even though it is O(n), it may be highly inefficient.

- fernando February 06, 2016 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This will require a lot of read/write from memory to disk, in order to pick out the appropriate hash table for each number. So even though it is O(n), it may be highly inefficient.

- fergow February 06, 2016 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Divide the array in blocks of 0.5Gb each. Then proceed in a Bubble Sort fashion by sorting adjacent blocks. This is equivalent to the swapping procedure used between scalars in Bubble Sort.

If B is the number of blocks

For i=1 to B
For j=1 to B-1
sort(Block_j merged to Block_{j+1})

The sorting can be performed in memory. The complexity is quadratic in B.

- Simo March 15, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Simo March 15, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

(sorry for the previous posts)

Divide the array in blocks of 0.5Gb each. Then proceed in a Bubble Sort fashion by sorting adjacent blocks. This is equivalent to the swapping procedure used between scalars in Bubble Sort.

If B is the number of blocks

For i=1 to B
For j=1 to B-i
sort(Block_j merged to Block_{j+1})

The sorting can be performed in memory. The complexity is quadratic in B.

- Simo March 15, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Consider 1 TB array has size N, and 1 GB array has size n
1. Partition N into block of size n
2. Inplace/using memory Sort all blocks
3. use memory as min heap and take first element of every sorted block
4.heapify and take out first element from min heap push to first position of N array
5. take second element from same array n add to min heap. and repeat (4,5) until we get sorted array

Complexity : (n+1)N * log(n)

- Akshay Dhule October 18, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think we can convert this question to merge K sorted list, right?

- Jason Guo November 09, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I will suggest quicksort in this case
check quick-sort algorithm tutorial

Worst case performance O(n2)
Average case performance O(n log n)
Average space complexity O(log n)

- chovatiyabhavesh January 29, 2016 | Flag Reply

CareerCup

Microsoft Interview Question for Software Engineers

Books

Videos

Resume Review

Mock Interviews