Google Interview Question
Software Engineer / DevelopersCountry: United States
Assumption before solving the problem:
Definition of Median: if the size of array is odd, such as {1, 2, 3, 4, 5}, 3 is the median number; if the size of array is even, such as {1, 2, 3, 4, 5, 6}, I will assume that either 3 or 4 is median.
Basic Idea:
suppose we have two arrays:
A = {2, 4, 6, 8, 10}
B = {1, 3, 5, 7, 9, 11, 13}
We compare the two median number, Median(A) = 6, and Median(B) = 7.
since Median(A) < Median(B), we can remove the first half of A, that is {2, 4}, and the second half of B, that is {9, 11, 13}, because the mutual median number cannot be in these two parts. In order to keep the mutual median number unchanged, in this case, we can remove {2, 4} and {11, 13}. As we remove two elements that are less than mutual median and two elements that are larger than mutual median, the original mutual median will remain same.
So that after removal, A and B changed to be:
A = {6, 8, 10}
B = {1, 3, 5, 7, 9}
We can continue using this method until we find the mutual median number, below is the implementation in Java (may have some small bugs in the code)
public static int getMutualMedian(int[] A, int[] B) {
// if A is an empty array, return the median of array B
if (A.length == 0) return getMedian(B, 0, B.length-1 );
// if B is an empty array, return the median of array A
if (B.length == 0) return getMedian(A, 0, A.length-1);
// calling for recursive helper function
return getMutualMedian(A, B, 0, A.length-1, 0, B.length-1);
}
// get the median number of one single array with specific range
public static int getMedian(int[] A, int start, int end) {
if (start > end) return 0;
int median = (start + end) / 2;
return A[median];
}
// get the median number of one single array and an extra integer
public static int getMedian(int a, int[] B, int startB, int endB) {
if (startB == endB) return a;
int median = (startB + endB) / 2;
if (a >= B[median]) return B[median+1];
return B[median];
}
// helper function to calculate the mutual median
public static int getMutualMedian(int[] A, int[] B, int startA,
int endA, int startB, int endB) {
//if (startA == endA && startB == endB) return A[startA];
// Base case: array A has only one element
if (startA == endA) return getMedian(A[startA], B, startB, endB);
// Base case: array B has only one element
if (startB == endB) return getMedian(B[startB], A, startA, endA);
int medianA = (startA + endA) / 2;
int medianB = (startB + endB) / 2;
// Special case: two median numbers are equal
if (A[medianA] == B[medianB]) return A[medianA];
int offset = 0;
if ((medianA - startA) >= (medianB - startB)) {
offset = endB - medianB;
} else {
offset = endA - medianA;
}
// update the range to search the median number
if (A[medianA] > B[medianB]) {
startB += offset;
endA -= offset;
} else {
startA += offset;
endB -= offset;
}
return getMutualMedian(A, B, startA, endA, startB, endB);
}
JAVA: The code is a bit clogged because I took care of the scenario where you need to take the average of both middle values if you get an even number of integers when merging both arrays. Any ways to make it a bit cleaner? (Aside from putting the code replicate in another function), something actually clever...
public int getMutualMedian(int[] A, int[] B){
boolean isEven = ((A.length + B.length) % 2 == 0)
int medianIndex = (A.length + B.length)/2;
int indexA = 0;
int indexB = 0;
int median = 0;
int median1 = 0;
int median2 = 0;
while(A[indexA] != null && B[indexB] != null && medianCount-- != -1){
if(A[indexA] < B[indexB]){
if (isEven){
median1 = median2
median2 = A[indexA++];
median = (median1+median2)/2;
} else
median = A[indexA++];
}
else {
if (isEven){
median1 = median2
median2 = B[indexB++];
median = (median1+median2)/2;
} else
median = B[indexB++];
}
}
while(A[indexA] != null && medianCount-- != -1){
if (isEven){
median1 = median2
median2 = A[indexA++];
median = (median1+median2)/2;
} else
median = A[indexA++];
}
while(B[indexB] != null && medianCount-- != -1){
if (isEven){
median1 = median2
median2 = B[indexB++];
median = (median1+median2)/2;
} else
median = B[indexB++];
}
return median;
}
Algorithm GetMutualMedian(A[m], B[n])
Input: sorted array A[m] and B[n]
Output: return the median of the combined array C[m+n]
if m=1, n=1 return A[m] or B[n]
if A[m/2] < B[n/2]: upper half of B is larger than lower half of B and lower half of A, thus upper half of B
should be larger than (m+n)/2 elements, recursively call GetMutualMedian(A[m], B[1-n/2])
if A[m/2] > B[n/2]: upper half of A is larger than lower half of A and lower half of B, thus upper half of A
should be larger than (m+n)/2 elements, recursively call GetMutualMedian(A[1-m/2], B[n])
if A[m/2] == B[n/2]: return A[m/2] or B[n/2]
Time complexity = log(m)+log(n)
could you give some explanation for "upper half of B is larger than lower half of B and lower half of A, thus upper half of B
should be larger than (m+n)/2 elements, "
probably with an example..thanks a lot
Could you please explain how it is B[1-n/2] means how it's "1-".. it is coming negative index?
logic seems good..
A divide-and-conquer solution for median has to find the Nth element among the partial lists after the first step of recursion. For example, if you have two lists of size 100, and you can eliminate the first 50 elements of the first list, then now you've reduced your problem to finding the 50th of 150 elements, but that's no longer a median.
First and foremost, I would try to find the Nth element among the lists, not the medians, and then I'd treat the median as a special case.
If constant time you can determine if Amax < Bmin or Bmax < Amin, and then you can find the Nth element in constant time as well.
If the size of both lists are small (say < 10), then simply merge them to find the Nth element.
If we have two large overlapping lists, then consider where the Nth element could be. Suppose size(A) = 13, size(B) = 30, and N=19. It's possible for any element of A to be the 19th element among the two lists, but it's clear that the first 6 elements of B have to be less than the 19th element, since at max 12 elements of A could be less than the 6th element of B. So we can recurse on finding the 13th element among all A and among 24 elements of B. For small values of N you can trim the larger list on the right using similar reasoning.
This problem is far from trivial to do efficiently, and it has lots of fiddly cases to deal with, but it's definitely amenable to some level of divide and conquer.
I'm not sure this captures every edge case, but it's a sketch of divide-and-conquer.
def nth_smallest(lst1, lo1, hi1, lst2, lo2, hi2, n):
if lo1 == hi1:
return lst2[lo2+n]
if lo2 == hi2:
return lst1[lo1+n]
if n == 0:
return min(lst1[lo1], lst2[lo2])
offset1 = min([n/2, hi1 - lo1 - 1])
offset2 = min([n/2, hi2 - lo2 - 1])
print offset1, offset2
if lst1[lo1+offset1] < lst2[lo2+offset2]:
return nth_smallest(
lst1, lo1+offset1+1, hi1,
lst2, lo2, hi2,
n-offset1-1)
else:
return nth_smallest(
lst1, lo1, hi1,
lst2, lo2+offset2+1, hi2,
n-offset2-1)
def mutual_median(l1, l2):
n = len(l1) + len(l2)
mid = n / 2
return nth_smallest(l1, 0, len(l1), l2, 0, len(l2), mid)
int GetMutualMedian(int *A, int *B, int n, int m, int pos) {
if (n == -1) return B[pos];
if (m == -1) return A[pos];
int amid = pos/2 > n ? n : pos/2;
int bmid = pos/2 > m ? m : pos/2;
if (A[amid] > B[bmid]) {
if (bmid == m) return GetMutualMedian(A, B, n, -1, pos-bmid);
else return GetMutualMedian(A, B+bmid, amid, m-bmid, pos/2);
}
else if (A[amid] < B[bmid]) {
if (amid == n) return GetMutualMedian(A, B, -1, m, pos-amid);
else return GetMutualMedian(A+amid, B, n-amid, bmid, pos/2);
}
else {
if (n >= pos/2 && m >= pos/2) return A[amid];
else if (pos/2 > n) return GetMutualMedian(A, B, -1, m, pos-n);
else (pos/2 > m) return GetMutualMedian(A, B, n, -1, pos-m);
}
}
My answer is wrong. I assume b[0] > a[m-2]. Anyway, keep it.
my solution is O(1).
assume a has m elements, b has n elements.
assume m <= n.
The question is to find the (m+n)/2 in the combined array, (m+n)/2 is bigger than m.The miracle happens when we compare the b[(m+n)/2 - m ] and a[m-1]:
if b[(m+n)/2-m] >= a[m-1]
return b[(m+n)/2 - m];
else
return a[m-1] >= b[(m+n)/2-m+1] ? b[(m+n)/2-m+1] : a[m-1];
for example
a[] = { 1, 2, 5, 9}
b[] = { 8, 13 ,15 ,20, 21}
we want c[4] in combined array c.
compare b[0] = 8 and a[3] = 9
b[0] < a[3]
then compare b[1] and a[3],
b[1] > a[3] then the result c[4] = a[3] = 9.
the code explains well. Anyway, I explain more.
if b[(m+n)/2-m] >= a[m-1], then a[0], a[1], ... a[m-1], b[0], b[1], ... b[(m+n)/2-m-1] is smaller than b[(m+n)/2-m], which means it is c[(m+n)/2] in the combined array c.
Are you assuming b[0] >= a[m-1]? I think there is no such constraint. a and b are sorted, but they can interleave in arbitrary ways.
Here's how I know your answer can't be right: according to your code, the only possible answers are b[(m+n)/2 - m], b[(m+n)/2-m+1], and a[m-1]. So as long as m and n are fixed, according to you, there can only be 3 different array positions that can be the answer (none of the array indexes depend on anything other than m and n). Now consider the following construction:
a = {0, 0, 0, 0, ..., 0} (length m)
b = {1, 2, 3, 4, 5, ... , n} (length n)
Let's say we choose n to be significantly greater than m. Since the elements of b are also all greater than the elements of a, the median will lie somewhere in b. Specifically, like you said, at the (m+n)/2 - m th element in b. Let's say the answer to the median in that case happens to be b[r] (r = (m+n)/2 - m). Now, suppose we change the last element of a to be n+1. The median now has to be b[r+1]. Change the second-to-last element of a to be n+1 also, and the median now has to be b[r+2]. If another element in a is changed, the median is b[r+3]. This shows that there are more than 3 positions that can be the answer, and therefore no solution that proposes that the median will always be in one of 3 positions can be correct.
Divide-and-conquer algorithms don't need to divide the problem in half each time. From chapter 4 (Divide-and-Conquer) of CLR:
"Subproblems are not necessarily constrained to being a constant fraction of the original problem size. For example, a recursive version of linear search would create just one subproblem containing only one element fewer than the original problem."
This divide-and-conquer algorithm simply pops the smaller value off the head of one of the lists and recurses on the rest of the problem.
Pseudocode:
merge_sorted_lists(list1, list2)
if(list1 and list2 are empty)
return empty list
if(list1 is empty)
return list2
else if(list2 is empty)
return list1
else if(head of list1 < head of list2)
smaller = pop head of list1
else if(head of list2 < head of list1)
smaller = pop head of list2
return smaller + merge_sorted_lists(list1, list2)
Java:
import java.util.ArrayList;
public class MaidenMerge {
public static void main(String args[]) {
// TEST HARNESS
int[] array1 = new int[] {1, 3, 5, 7, 9};
int[] array2 = new int[] {2, 4, 6, 8, 10};
ArrayList<Integer> list1 = new ArrayList<Integer>();
ArrayList<Integer> list2 = new ArrayList<Integer>();
for(int i = 0; i < array1.length; i++)
list1.add(new Integer(array1[i]));
for(int i = 0; i < array2.length; i++)
list2.add(new Integer(array2[i]));
ArrayList<Integer> merged = maiden(list1, list2);
System.out.println(merged.toString());
}
// RECURSIVE METHOD
private static ArrayList<Integer> maiden(ArrayList<Integer> list1, ArrayList<Integer> list2) {
ArrayList<Integer> result = new ArrayList<Integer>();
int smaller;
if(list1.size() == 0 && list2.size() == 0)
// Both lists are empty. This is our base case.
return result;
// Set "smaller" to the lesser of the heads of list 1 or list 2
if(list2.size() == 0) {
// List 2 is empty
return list1;
} else if(list1.size() == 0) {
// List 1 is empty
return list2;
} else if(list1.get(0) < list2.get(0)) {
// The head of list 1 is smaller than the head of list 2
smaller = list1.get(0);
list1.remove(0);
} else {
// The head of list 2 is smaller than the head of list 1
smaller = list2.get(0);
list2.remove(0);
}
// result = smaller + the recursive result of the sublists
result.add(new Integer(smaller));
result.addAll(maiden(list1, list2));
return result;
}
}
int findKSmallest(int a[], int n, int b[], int m, int k)
{
if (n > m) return finKSmallest(b, m, a, n);
if (n==0) return b[k-1];
if (k==1) return min(a[0], b[0]);
int na = min(n, k/2);
int nb = m-na;
if (a[na-1] <= b[nb-1]) return findKSmallest(a, n-na, b, m, k-na);
else return findKSmallest(a, n, b, m-nb, k-nb);
}
int median(int a[], int n, int b[], int m)
{
int size = n+m;
if (size&1) return findKSmallest(a, n, b, m, size/2+1);
else return (findKSmallest(a, n, b, m, size/2+1) + findKSmallest(a, n, b, m, size/2) ; )/2.0;
}
Isn't the problem too trivial if the arrays are already sorted? It sounds like you only to apply a fraction of what is needed in the standard merge sort algorithm. Basically, you could retrieve the total number of element and divide that by 2, then you can do the simple loop where as long as you haven't reached the end of your left and right array you increment the pointer on the array on which the current element is the lowest, when you have counted to total size / 2, return that value. That is, if you are looking for the Median...
If you take that approach,then the search can become linear if the input arrys are of very different size... One very small and the other very large.
The difficulty is indeed to achieve log M + log N efficiency and it is not trivial at all.
Do you mean median?
- Frank March 01, 2013