Google Interview Question


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
3
of 3 vote

By using maxHeap (containing bottom 50% of the numbers we read), and minHeap(containing top 50% of the number we read) it would only cost O(log(n)) to keep updating median. as you keep reading the nextNumber, c if the

if (number > minHeap.peek())
	minHeap.push(number);
	balance();
    else 
	maxHeap.push(number);
	balance();

where balance method is supposed to keep the balance between min and max tree by comparing their sizes, (size difference shouldnt be bigger than 1) and update the median, accordingly (median = minHeap.peek() if minHeap.size() > maxHeap.size(), and so on)

- kevin k. August 28, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

// insert current element to the left or right heap and get median so far
public int getMedian(final int current, final int med, final MaxHeap<Integer> left, final MinHeap<Integer> right) {
    final int balance = left.size() - right.size();
    int median = med;

    // both heaps are of equal size.
    if (balance == 0) {
        // need to insert in left
        if (current < median) {
            left.insert(current);
            median = left.top();
        }
        // need to insert in right
        else {
            right.insert(current);
            median = right.top();
        }
    }
    // left heap is larger
    else if (balance > 0) {
        // need to insert in left
        if (current < median) {
            right.insert(left.extract());
            left.insert(current);
        }
        // need to insert in left
        else {
            right.insert(current);
        }

        median = (left.top() + right.top()) / 2;
    }
    // right heap is larger
    else if (balance < 0) {
        // need to insert in left
        if (current < median) {
            left.insert(current);
        }
        // need to insert in left
        else {
            left.insert(right.extract());
            right.insert(current);
        }

        median = (left.top() + right.top()) / 2;
    }

    return median;
}

public int getStreamMedian(final int[] stream) {
    int median = 0;
    final MaxHeap<Integer> left = new MaxHeap<Integer>();
    final MinHeap<Integer> right = new MinHeap<Integer>();

    for (int i = 0; i < stream.length; i++) {
        median = getMedian(stream[i], median, left, right);
    }
    return median;
}

- zahidbuet106 September 06, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Huh?? What am I missing here? Call readNextNumber() till it hits null; while reading push data to a vector (talking C++). Sort the vector, divide size by 2 and you have the median. Sorting takes the longest time - O(nLNn)

- thewhatwhat August 28, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
2
of 2 votes

You are not getting data at once. Its a stream of data. Consider someone sending the data intermittently over the network. So as soon as you get the some data, you need to find the median.

You will repeatedly sorting the data and finding the median. But since you already have sorted results from the previous iteration, when you add a new number to the list you, you should not be sorting the whole list again. That's where the heaps come.

The self-balancing BST (height balanced BST) will not work because the numbers of nodes in the left subtree can be more than night. We need to use the min and max heap combination. All the numbers less than median will be in max heap and the ones greater than median in the min heap. So when you add a new number, if it is less than median you add to max heap. Take out the top element from max heap, make it the median and move the previous median to the min heap. Visa versa if the number being added is less than median.

Time complexity wise it will be Log(n) as insertion and deletion for a binary heap is Log(n) which is way better than n Log(n)

- don August 30, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

requirement is O(n) or less

- lucifer1986 August 30, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@don - A self-balancing BST will absolutely work if you augment each node in the tree with a "size" attribute which holds the number of nodes in the subtree. Inserting new elements only affects the "size" attributes of nodes "local" to the insertion path. Thus updating the "size" attribute does not change the asymptotic time complexity of the insert operation (this holds also for delete). Each insertion is O(log n). Finding ANY order statistic at any time is O(log n).

- Dave September 01, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Will the stream contains duplicate integers?

- minglotus6 August 29, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Assume it can. Using a two-heap, recursive-partition or order statistic tree will work in either case because they track, as their primary concern, the numbers of elements above and below some partition value. The values themselves are the secondary consideration.

- Dave September 01, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

You have a couple of choices. One is O(n) but will only tell you the median of an accumulated set of numbers. The other uses a minheap and maxheap - this will be O(1) to find the median once you are asked for it but requires O(log n) each time a new number is added to the set.

For the first approach you use the partition subroutine from quicksort to recursively partition the set - ideally roughly in two halves per recursion. To achieve an even split, select a pivot value using a linear time median finding function, such as "median of medians" (if you only need O(n) "expected" runtime then you can use a randomised pivot selection which is simpler to implement in an interview). At each level of recursion you have, in O(1) time, the order statistic of the pivot value. You compare this to the desired order statistic and either return the pivot value as a match, or enter the next level of recursion into the partition containing your desired element. This method can be used to find ANY order statistic, and as a byproduct also yields the set of values above and below that statistic - eg the 10 largest numbers in the set. The algorithm is O(n) because you have at most O(log n) recursions and at each level you have to scan half as many elements as the previous level.

A Python example implementation of approach two follows - we define the "median" of an even number of elements to be the greater of the two medial values:

def find_median(line):
    seq = (int(i) for i in line.strip().split(' '))
    left_heap = Heap(MAX)
    right_heap = Heap(MIN)
    left_heap.push(-(2 ** 63))
    right_heap.push(2 ** 63 - 1)
    for num in seq:
        if right_heap.size > left_heap.size:
            if num < right_heap.peek():
                left_heap.push(num)
            else:
                left_heap.push(right_heap.replace(num))
        else:
            if num > left_heap.peek():
                right_heap.push(num)
            else:
                right_heap.push(left_heap.replace(num))
    return right_heap.peek()

- Dave September 01, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

You could also use an Order Statistic Tree, which is an augmented self-balancing binary tree, such as a RedBlack Tree. Implementation of this in an interview could be a bit much as you can see below - I haven't even bothered implementing "delete":

RED = "red"
BLACK = "black"
NIL = None


class Node(object):

    def __init__(self, key, value=None):
        super(Node, self).__init__()
        self.key = key
        self.value = value
        self.parent = NIL
        self.left = NIL
        self.right = NIL
        self.colour = BLACK
        self.size = 0

    def __str__(self):
        return "NIL" if self == NIL else str(self.key)

# noinspection PyRedeclaration
NIL = Node(None)


class OSTree(object):

    def __init__(self):
        super(OSTree, self).__init__()
        self.root = NIL

    def os_select(self, i):
        node = self.root
        while node != NIL:
            rank = node.left.size + 1
            if i == rank:
                return node
            if i < rank:
                node = node.left
            else:
                i -= rank
                node = node.right
        return node

    @staticmethod
    def os_rank(node):
        rank = node.left.size + 1
        while node != NIL:
            if node == node.parent.right:
                rank += node.parent.left.size + 1
            node = node.parent
        return rank

    def search(self, key):
        node = self.root
        while node != NIL and node.key != key:
            if key < node.key:
                node = node.left
            else:
                node = node.right
        return node

    def insert(self, key, value=None):
        node = Node(key, value)
        node.size = 1
        x = self.root
        while x != NIL:
            node.parent = x
            x.size += 1
            if node.key < x.key:
                x = x.left
            else:
                x = x.right
        if node.parent == NIL:
            self.root = node
        else:
            if node.key < node.parent.key:
                node.parent.left = node
            else:
                node.parent.right = node
            node.colour = RED
            self._insert_fix(node)

    def _rotate_left(self, x):
        y = x.right
        y.parent = x.parent
        if y.parent == NIL:
            self.root = y
        elif y.parent.left == x:
            y.parent.left = y
        else:
            y.parent.right = y
        x.right = y.left
        if x.right != NIL:
            x.right.parent = x
        y.left = x
        x.parent = y
        y.size = x.size
        x.size = x.left.size + x.right.size + 1

    def _rotate_right(self, x):
        y = x.left
        y.parent = x.parent
        if y.parent == NIL:
            self.root = y
        elif y.parent.left == x:
            y.parent.left = y
        else:
            y.parent.right = y
        x.left = y.right
        if x.left != NIL:
            x.left.parent = x
        y.right = x
        x.parent = y
        y.size = x.size
        x.size = x.left.size + x.right.size + 1

    def _insert_fix(self, node):
        while node.parent.colour == RED:
            if node.parent == node.parent.parent.left:
                uncle = node.parent.parent.right
                if uncle.colour == RED:
                    uncle.colour = BLACK
                    node.parent.colour = BLACK
                    node.parent.parent.colour = RED
                    node = node.parent.parent
                else:
                    if node == node.parent.right:
                        self._rotate_left(node.parent)
                        node = node.left
                    node.parent.colour = BLACK
                    node.parent.parent.colour = RED
                    self._rotate_right(node.parent.parent)
            else:
                uncle = node.parent.parent.left
                if uncle.colour == RED:
                    uncle.colour = BLACK
                    node.parent.colour = BLACK
                    node.parent.parent.colour = RED
                    node = node.parent.parent
                else:
                    if node == node.parent.left:
                        self._rotate_right(node.parent)
                        node = node.right
                    node.parent.colour = BLACK
                    node.parent.parent.colour = RED
                    self._rotate_left(node.parent.parent)
        self.root.colour = BLACK

    def delete(self, node):
        pass

- Dave September 01, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

/*
* Algorithm:
*For the first two elements add smaller one to the maxHeap on the left, and bigger one to the minHeap on the right. Then process stream data one by one,
*
*Step 1: Add next item to one of the heaps
*
* if next item is smaller than maxHeap root add it to maxHeap,
* else add it to minHeap
*
* Step 2: Balance the heaps (after this step heaps will be either balanced or
* one of them will contain 1 more item)
*
* if number of elements in one of the heaps is greater than the other by
* more than 1, remove the root element from the one containing more elements and
* add to the other one
*
* Then at any given time you can calculate median like this:
* If the heaps contain equal elements;
* median = (root of maxHeap + root of minHeap)/2
* Else
* median = root of the heap with more elements
*/

- vivekh September 03, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

+1

- dhs.du.08 September 06, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

+1

- dhs.du.08 September 06, 2015 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More