## Facebook Interview Question

SDE1s**Country:**United States

Working solution in Java. Array elements are added one after the other so can be fed from other machines as well.

```
import java.util.*;
class Main {
static PriorityQueue<Integer> minHeap = new PriorityQueue<>();
static PriorityQueue<Integer> maxHeap = new PriorityQueue<>(new Comparator<Integer>(){
public int compare(Integer a,Integer b){
return b.compareTo(a);
}
});
public static void main(String[] args) {
System.out.println("Hello, world!");
int[] arr = {4,3,1,8,4,7,6};
for(int i: arr){
findmedian(i);
}
}
public static void findmedian(int x){
if(maxHeap.isEmpty() || x<maxHeap.peek()){
maxHeap.add(x);
}else{
minHeap.add(x);
}
rebalance();
printmedian();
}
public static void rebalance(){
if(Math.abs(minHeap.size()-maxHeap.size())>1){
PriorityQueue<Integer> biggerHeap = (minHeap.size()>maxHeap.size())?minHeap:maxHeap;
PriorityQueue<Integer> smallerHeap = (minHeap.size()>maxHeap.size())?maxHeap:minHeap;
int element = biggerHeap.remove();
smallerHeap.add(element);
}
}
public static void printmedian(){
PriorityQueue<Integer> biggerHeap = (minHeap.size()>maxHeap.size())?minHeap:maxHeap;
PriorityQueue<Integer> smallerHeap = (minHeap.size()>maxHeap.size())?maxHeap:minHeap;
if(biggerHeap.size()==smallerHeap.size()){
float median = (float)(biggerHeap.peek()+smallerHeap.peek())/2;
System.out.println("ans:"+median);
}else{
System.out.println("ans:"+biggerHeap.peek());
}
}
}
```

if Range of the Keys is limited

eg if you are finding a median Age use Counting Sort to find the median

Credits : -Intro to Algo CLRS

{{

A-> Given Unsorted Array N-> number of Elements in Array; m - Range of Values

Create a New array getCount ---Set index of m 0 to m-1 to 0

for i=0 to N-1 //loop through the Array A

Index= A[i]-1

getCount [Index] ++; //increment the counting table

}}

After this run the following

{{

N-> number of Elements in Array A; m - Range of Values getCount > array produced above

Set Variable lessThan to 0

j=0

while j is less than m-1 or

if(getEqual[j] >0)

lessThan =getEqual[j] +lessThan

if(lessThan==N/2 and N%2==0) // Edge case where A is even and the median needs to be average of 2 Central elements

{

return (LastVariable+m+1)/2;

}

if(lessThan>N/2)

{

return m+1;

}

var LastVariable=m+1

}

}}

- the common approach is to use the quicksort partition method.

partition will pick a guess of a median and place smaller elements

on the left side of the array, equal elements in the middle part and

greater element on the right side and return the index of start and end

of middle part.

This is applied repeatedly until the array is separated

into two equal sized parts (considering the even length case). It has

the challenge of selecting a good pivot element...

One needs to consider as well odd and even length arrays, see code below.

- how to distribute:

one approach could be to guess the median, by just

determening the median on one machine. This would work if the

distribution accross machines is uniform and if a statistical

error is accepted

- alternatively one could use median of median method, where

every machine calculates a median and at the end median

of medians is calculated. This will be a pretty good aproximation,

but it's still an approximation.

- first machine does guess a good median, e.g. by using

median of medians, then it promotes this median to

other machines, each machine will then send back, how many

elements are left and right of that element, this information

is then used to repeat the process (look left or right)

the problem here is the communication among the machines, the slowest

or potentially a dead machine will slow the progress, further machines

don't work until the intermediate step has been completed by all others.

- a combination of those steps would be good, every machine

calculates the median +/- 10 elements and sends it to a coordinator.

the coordinater then determines median of medians and determines if it's

within the elements provided, if so, it can calculate the perfect

median, if not it will get back, with the best aproximation...

- what should be considered is that in a real life application

there is probably no point in time where the data won't change (e.g. grow).

so, if the perfect mean needs to be found and the algorithm is supposed

to terminate under all circumstances, some sort of snapshot mechanism

is required.

anyway here the median based on partition

```
def partition(data, begin, end):
pivot_idx = (begin + end) // 2 # midle, better randomize or median of median
pivot = data[pivot_idx]
data[end], data[pivot_idx] = data[pivot_idx], data[end] # move pivot to the end
piv_begin = 0 #[begin..piv_begin) < pivot
piv_end = 0 #[piv_begin..piv_end) = pivot
idx = 0 #[piv_end..idx) > pivot
while idx <= end:
if data[idx] < pivot:
data[piv_end], data[idx] = data[idx], data[piv_end] # swap
data[piv_begin], data[piv_end] = data[piv_end], data[piv_begin] # swap
piv_end += 1
piv_begin += 1
elif data[idx] == pivot:
data[piv_end], data[idx] = data[idx], data[piv_end] # swap
piv_end += 1
idx += 1
return (piv_begin, piv_end - 1)
def median(data):
n = len(data)
if n == 0: return None
if n == 1: return data[0]
begin = 0
end = n - 1
while begin < end:
m_beg, m_end = partition(data, begin, end)
if m_beg > (n - 1) // 2:
end = m_beg
elif m_end < (n - 1) // 2:
begin = m_end
else:
begin = (n - 1) // 2
end = (n - 1) // 2
if n % 2 == 1: return data[begin] #odd case, one element
return (data[begin] + min(data[begin + 1:])) / 2 #even case average of the two midle elements
print(median([1,2,3])) #2
print(median([3,2,1,4])) #2.5
print(median([2,1])) #1.5
print(median([3])) #3
print(median([9,9,9,1,1,1])) #5
```

This problem is almost same as "Running Median" problem. It can be found in book "Cracking the coding interview 6th ed: - Gayle Laakmann". You will need 1 max heap and 1 min heap. When inserting, keep both of them balanced. The median will be the average of top element of both the heaps if they both have same number of elements, otherwise it will be the top element of the heap with more elements.

@krupen.ghetiya

the question asks for the *median* of all elements from a dataset where you have random access to the elements. Your answer is for a *running median* problem, that is the median can be queried any point in time from the seen elements. Typically in running median data arrives from a stream, which means no random access.

As a result, the two heaps *running median* is not linear in time, it's O(lg(N)). ...

Distribution is a different beast, too because it would be safe to assume the data won't fit into the memory of a single machine and the two heaps solution would need some major tricks/assumptions/modifications to work in most cases

The Linear Median finding algorithm divides the input (A) into groups of size 5, finds the medians of medians, and use it as the pivot to partition the data; then according to the locations of pivot, it decides to proceed in the left or right of it.

This "Magical" algorithm (as explained in CLRS) is in O(n). Here is its implementation in c++. Please note that it works for ant k-th smallest value and for the median the input k should be A.size()/2

- a.asudeh July 22, 2017