An array contains N numbers wh

Interview Question for SDE-2s

1

of 1 vote

16
Answers
An array contains N numbers where N is huge. There are only k distinct numbers. Sort those k numbers.

I tld him to use a hashmap for this. He was fine with the solution. But he wanted me to optimise it for a multicore machine. I couldn't come up with an answer. :(
- Rahul May 22, 2013 in India | Report Duplicate | Flag | PURGE
SDE-2 Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

Use a nice binary search tree (like Red-black tree or something with a quick search (O(log(k)))

Add the number when it does not exists.
It takes ~n log k time max. So its O(n log(k))

if k is small, log(k) = small Constant, then its ~O(n)

- Mo Lam May 23, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

BST's in-order transversal is a sorted result.
You can even add a count for each node to count the elements.

- Mo Lam May 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How about this parallelizing this solution.

Spawn M threads (where M could be equal to the number of cores), each thread inspecting N/M elements in the array (Assume that we split the array into multiple(M) partitions). We'd also keep a count of nodes while populating the BST in each thread , so that the moment the count is K, we can exit the thread returning the inorder traversal of the BST.

So, in the best possible scenario, at least one of the partitions would contain all the K distinct nos, in which case the thread processing that partition would hit the solution. If all the partitions contain less than K distinct elements,we'll have to merge the results from the M threads.

In a fairly distributed arrangement where K << N,the best case scenario is likely be hit frequently..What do you guys say?

- random May 25, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@random, What if we don't know what K is? But I do like the idea of using multi thread to process it, I think we can just keep M different tree and combine them at the end

- Mo Lam May 29, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Might he have been looking for a counting sort type approach since you know that there are only K distinct numbers (presumably K is relatively small)? This could be a very large improvement over an NlogN type sort.

I tried this recently to compare a very large set of numbers (10,000,000) that were bounded within a range vs STL's generic sort<uint32_t> algorithm and got about a 80% reduction in run time, so its definitely a good solution if you know your data is range bounded.

Per multicore (and Im guessing here) but we could divide a couting or key-index sort between different cores if we break up the O(N) steps up into M pieces (where you have M cores) as each of the M pieces should be parallelizable (is that even a word?)

- dn May 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

yea, but range might not be known.
I don't think one traversal O(N) to find the range will be a good idea though that can help here.

if we know the value of K, we can make virtual K lists in the existing data set itself. Keep track of begin and end of K lists with 2K indexes in dataset and keep swapping the data until all are sorted.

- Vifi May 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If there are p processors, split the input into p, Have each processor do the counting for its part, then later merge the results.

- Anonymous May 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Mergsort algorithm will be useful. break the input, sort the smaller parts and merge the result

- Prithvi May 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

-1: No. "only k" distinct numbers.

- Anonymous May 22, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think Prithvi may have been assuming that the merge sort would be useful after the unique values were stored in a HashMap and then retrieved.

Merge sort also parallelizes well.

- Goofy May 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

XOR the array, you will get only distinct number then sort the array

- yash May 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

XOR will give distinct elements only if any element is appering odd number of times, elements appearing even number of times will be lost. Correct me if I am wrong here.

- OTR August 06, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

//assuming that the k distinct numbers are 1..k

    class Program
    {
        static void Main(string[] args)
        {
            int[] arr = { 4, 3, 1, 3, 2, 1, 1, 3, 4 };
            int k = 4;
            Sort(arr, k);
        }

        static void Sort(int[] arr, int k)
        {
            int[] separators = new int[k];
            for (int x = 0; x < k; x++)
            {
                separators[x] = -1;
            }

            for (int i = 0; i < arr.Length; i++)
            {
                int t = arr[i];

                for (int j = k - 1; j >= t; j--)
                {
                    Swap(arr, separators[j] + 1, separators[j - 1] + 1);
                }

                for (int x = t - 1; x < k; x++)
                {
                    separators[x]++;
                }
            }
        }

        static void Swap(int[] arr, int index1, int index2)
        {
            int temp = arr[index1];
            arr[index1] = arr[index2];
            arr[index2] = temp;
        }
    }

- Yoni Haimovich May 28, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

One approach is to split the values between the cores and let them sort their assigned values individually. Then create a minheap that contains the first entry of every sorted sub-array in separate cores along with a reference to its corresponding core. Now, extract from the heap, write to the destination, and add a value from the core which the extracted value corresponded to, followed by maintaining the meanheap property (after the insertion of the new value). Follow this procedure until all data is added to the destination. It takes O((n/m)log(n/m)) for each core to sort, hence the total complexity for sorting, given m cores, would be O(nlogn). Meanheap operations are all logn. Hence O(nlogn) in total.

- Anonymous May 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Use in-place counting sort. time: O(N), space: O(K)

public static void sort( int[] arr ){

		int max = ArrayUtils.maxValue(arr);		
		
		int[] countArr = new int[max + 1];
		
		for( int value : arr ){
			countArr[value] += 1;
		}
		
		int baseIndex = 0;
		int count;
		
		for( int i = 0; i < countArr.length; i++ ){	
			
			count = countArr[i];			
			
			while( count > 0 ){
				arr[baseIndex] = i;
				++baseIndex;
				--count;
			}			
		}
	}

- m@}{ May 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This code assumes that all N are integers greater than 0. So you'd probably need 2 arrays to account for negative numbers.

Also, if you want to get very detailed, "for( int value : arr ){" could probably be changed to loop through all the values in array explicitly to avoid assumption that you can iterate through the array (i.e.

for(i=0;i<arr.lenth;i++) {
  if(arr[value]>=0) {
   positiveCountArr[arr[value]] += 1; 
  } else {
   negativeCountArr[-arr[value]]+=1;
  }

Then you could at first parallelize by sorting the negativeCountArr on 1 processor and positiveCountArr on another processor; further parallelism would be achieved by dividing up the ranges in each array by half.

This implementation is pretty flexible regarding parallelism because the results don't move around. But it could be pretty inefficient if the largest number is high because you could be counting to 2 billion for a sparse array.

- Goofy May 23, 2013 | Flag

CareerCup

Interview Question for SDE-2s

Books

Videos

Resume Review

Mock Interviews