Google Interview Question
Software Engineer / DevelopersTeam: Graduate Recruitement Team
Country: Canada
Interview Type: Phone Interview
We can find median using random selection algorithm in linear time.
Actually here we are utilizing "5-random-elements method" to find a median in efficient way.
Pick 5 elements at random from
A[1..n], and set q to be their median.
What it is the probability that q is not a good pivot ?
• Let S be the elements of A[1..n] which are in the 10% smallest.
• The probability that an elements picked at random is in S is 0.1.
• q is in S only if at least 3 of the 5 elements that we pick are in S.
• The probability that this happens is
0.15 + 5•0.14 •0.9 + 10• 0.13 •0.92 =
all in S 4 in S,one not in S 2 not in S
= 0.00001 + 0.00045 + 0.00810=0.00856
• This is also the probability that q is in the 10% largest elements.
• In other words: with probability ≥98%, q is a good pivot.
Putting it together, during partition, each time that we need to find a pivot,
we use the “5 random elements” method.
With probability 98%, we find a good pivot.
The overall time that we spend on good partitions is much smaller than
the time we spent on bad partitions.
This algo can be apllied on QuickSelect or Selection algorithm.
_If_ the range of numbers, which are _integers_, is O(n) we can use:
int count[range]; //array of counts (or adjust range if lower limit isn't 0 )
Traverse your array whilst incrementing counts.
i.e., count[array[i]]++ for each index i from 0 to N-1.
Then linear sweep count array to find median in the obvious way.
accumulate=0;
for(i=0; i<range; i++)
if( 2*(accumulate += count[i] ) > N ) return i;
or maybe off by 1 errors above
O(n) guranteed, O(k) storage ( a priori ~ range is O(n) )
This is essentially bucket sort with bucket size=1 modified for finding medians (I just used the idea of quicksort -> quickselect). What would this be called? BucketOneSelect?
I don't think the limits of the range: O(n) is a valid assumption in general.
Even the assumption of integer number is questionable.
The method using quick select is a much better solution in practice that can use with any range of input and any type of input.
My post is one big implication, with hypotheses I stated :)
Yes, I'd use quickselect under "general" conditions.
Your comment runs fairly orthogonal to my post, and doesn't contradict it.
And careful with "better solution in practice" generalities.
public int findMedian(int[] a) {
int len = a.length;
if (len % 2 == 1)
return quickSelect(a, 0, len, len/2 + 1);
else
return (quickSelect(a, 0, len, len/2) + quickSelect(a, 0, len, len/2 + 1))/2;
}
public int quickSelect(int[] a, int left, int right, int rank) {
int pivot = a[rand(left, right)];
int leftEnd = partition(a, left, right, pivot);
int leftSize = leftEnd - left + 1;
if (leftSize == rank)
return max(a, left, leftEnd);
else if (leftSize < rank)
return quickSelect(a, leftEnd+1, right, rank-leftSize);
else
return quickSelect(a, left, leftEnd, rank);
}
public int partition(int[]a, int left, int right, int pivot) {
while (true) {
while (left <= right && a[left] <= pivot)
left++;
while (left <= right && a[right] > pivot)
right--;
if (left > right)
return left-1;
swap(a, left, right);
}
}
public int max(int[] a, int left, int right) {
int max = Integer.MIN_VALUE;
for (int i = left, i <= right; i++) {
if (a[i] > max)
max = a[i];
}
return max;
}
public int rand(int low, int high) {
return low + Math.random() * (high-low+1);
}
public void swap(int[] a, int left, int right) {
int tmp = a[left];
a[left] = a[right];
a[right] = tmp;
}
int quickselect(int a[], int n, int k)
{
assert(n >= k);
if (n==1) return a[0];
int i(0), lt(0), gt(n-1);
int p = median(a[0], a[n/2], a[n-1]);
while (i <= gt) {
if (a[i] == p) i++;
else if (a[i] < p) exch(i++, lt++);
else if (a[i] > p) exch(i, gt--);
}
if (k <= lt) return quickselect(a, lt, k);
if (k <= gt) return a[gt-1];
else return quickselect(a+gt+1, n-gt-1, k-gt-1);
}
int median(int a[], int n)
{
if (n&1) return quickselect(a, n, n/2+1);
else return (quickselect(a, n, n/2+1) + quickselect(a, n, n/2) ) /2.0;
}
You can also do this in O(n) time and O(n) space with two heaps -- a min heap and a max heap. Push the numbers onto the heaps keeping them equal in size or a max of 1 size difference. Median will be average of the tops of the two heaps if they are equal in size. If they are different sizes (max size difference of 1), median will be the top of the larger heap. This is also a way to continuously determine the median of a stream of integers of indeterminate size.
Here are the steps to find in linear ( o(n) time
1. Find the max and min elements of the list.
2. Take the avarage of the min and max element.
3. Now subtract the average element from the list.
4. Find the closest element which is close to 0, this will be the median of the list.
in C++ code
int FindNumber(int numbers[], int sequence, int start, int end)
{
if(sequence == start || sequence == end)
return numbers[sequence];
int m = numbers[sequence];
int l = start, r= end;
while(l < r)
{
while(numbers[l]>m)l++;
while(numbers[r]<m)r--;
int temp = numbers[l];
numbers[l] = numbers[r];
numbers[r] = temp;
}
if(sequence >= l) return FindNumber(numbers, sequence, l, end);
else return FindNumber(numbers, sequence, start, r);
}
int FindMedianNumber(int numbers[], int n)
{
return FindNumber(numbers, 4, 0, 9);
}
int numbers[9] = {9,2,4,3,8,5,7,6,1};
cout<< FindMedianNumber(numbers, 9);
Use Quick Select,
You have to check in n -> even or n -> odd
public static float getMedian(int arr[], int n) {
int p;
int low = 0;
int high = n-1;
p = partition(arr, low, high);
while(p != n/2) {
if(p < n/2) {
low = p+1;
} else {
high = p-1;
}
p = partition(arr, low, high);
}
int p1;
if (n % 2 == 1) {
return(arr[p]);
} else {
low = 0;
high = p-1;
p1 = partition(arr, low, high);
int n1 = n/2 - 1;
while(p1 != n1) {
if(p1 < n1)
low = p1 +1;
else
high = p1 -1;
p1 = partition(arr, low, high);
}
return((float)(arr[p] + arr[p1]) / 2);
}
}
#include<iostream>
using namespace std;
int sort(int * arr, int l,int h);
void partition(int * arr,int l, int h, int size);
void swap(int * arr, int a, int b);
void display(int * arr, int size);
void main()
{
cout<<"Enter the Size of arry"<<endl;
int size=0;
cin>>size;
int * arr=new int [size];
cout<<endl;
cout<<"Enter the array to sort"<<endl;
for (int i=0;i<size;i++)
cin>>arr[i];
display(arr, size);
partition(arr,0,size-1,size);
system("PAUSE");
}
void partition (int * arr,int l, int h, int size)
{
int pivot=0;
if(l>=h) return;
display(arr,size);
pivot=sort(arr,l,h);
partition(arr,l,pivot-1, size);
partition(arr,pivot+1,h, size);
}
int sort(int * arr, int l, int h)
{
int pivot=arr[l];
int i=l;
int j=h;
while(i<j)
{
while((pivot >= arr[i]) && (i<=h) )
i++;
while((pivot < arr[j]) && (j>=l) )
j--;
if(i<j)
swap(arr,i,j);
}
swap(arr,l,j);
return j;
}
void swap(int * arr, int a, int b)
{
int t=arr[a];
arr[a]=arr[b];
arr[b]=t;
}
void display(int * arr, int size)
{
for(int i=0; i<size;i++)
cout<<arr[i]<<",";
cout<<endl;
}
#include<iostream>
using namespace std;
int partition(int * arr, int l, int h,int m);
int search(int * arr, int l, int h);
void swap(int * arr,int a,int b);
void display(int * arr, int size)
{
for(int i=0;i<size;i++)
cout<<arr[i]<<",";
cout<<endl;
}
void main()
{
int size=0;
cout<<"Enter The Size of Array:"<<endl;
cin>>size;
int * arr=new int [size];
for(int i=0; i<size;i++)
{
cout<<"Enter the No::";
cin>>arr[i];
cout<<endl;
}
cout<<partition(arr,0, size-1,size/2);
cout<<endl;
display(arr,size);
system("PAUSE");
}
int partition(int * arr , int l , int h , int m)
{
int p=0;
p=search(arr,l,h);
if(p==m)
{
return arr[p];
}
else if(p<m)
{
partition(arr,p+1,h,m);
}
else
{
partition(arr,l,p-1,m);
}
}
int search(int * arr,int l,int h)
{
int p=arr[l];
int i=l;
int j=h;
while(i<j)
{
while(i<=h && p>=arr[i]) i++;
while(j>=l && p < arr[j]) j--;
if(i<j)
swap(arr,i,j);
}
swap(arr,l,j);
return j;
}
void swap(int * arr,int a,int b)
{
int t=arr[a];
arr[a]=arr[b];
arr[b]=t;
}
use Randomize selection algo or median of median Algorithm.
- varun October 05, 2013Median of median promises a tighter upperbound of O(n) as compare to randomize select O(n2)
But in practise randomize select is faster as compare to median of median as constants in median of median are large.
Edit:
In my opinion Median of Medians is slower as compare to QuickSelect with randomly chosen pivot in practice (never tested on large data).
Although worst case is O(N2) for Quickselect where as it is O(N) for median of Medians but it is near to impossible to produce worst case scenario for randomly chosen pivot.
I just posted an article on quick select on my blog, too lazy to explain it here as well, interested user can find the code + explanation + linear time complexity analysis (brief) at below link:
ms-amazon.blogspot.in/2013/10/quick-select-select-kth-smallest.html
In case I get some time, I'll post Median of Medians as well.
Hope it helps.
cheers :)