Mike Lewis
BAN USERThis looks about like how I answered it. Note that you can end up getting slightly inaccurate results from this with weird distributions (i.e. server 1's top 21 results are a1..a20 and foo, server 2's top 21 results are b1..b20 and foo; if a1..a20 and b1..b20 have no overlap, foo probably ought to be in the final top 20 but won't get returned to the centralized server)
I have been asked this question many times at big data companies. It's worth being able to refine your answer and be able to think about what you'll do if, say, the centralized server goes down.
I would be shocked if my initial answer wasn't sufficient in an interview, but if they actually needed you to re-expand the compressed data into the original sorted list, it's pretty straightforward:
ArrayList<Integer> output;
for (int i = 0; i < 1024; i++) {
for (int j = 0; j < sort[i]; j++) {
output.add(i);
}
}
1) why wouldn't it get the correct answer for the case of 5 million 2s and 5 million 3s?
2) as far as needing extra memory, it doesn't require O(N), it only requires 1024 extra ints, so 4096 bytes plus minimal overhead for the definition of the array, and that doesn't need to grow unless N gets larger than an int (and even then it just doubles in size when you use longs, and technically it could be an unsigned int in languages that support that so N needs to be twice as big). That doesn't seem like a big concession to make in return for sorting in O(N) time, and if there isn't anything important in the original data's ordering, you could actually look at this as a really efficient form of compression and throw out the original list.
alternatively, you can read as much as does fit into memory and then reverse that content and write it to a temp file, then at the end join all the temp files together.
- Mike Lewis November 30, 2012Note that you don't really need to do a second pass; the link you want to break is the one that joins the subtree with the lowest total with the rest of the tree.
I wrote this in gosu, but it should be fairly obvious how it works. I used a global variable to avoid having to do a second pass.
uses java.util.ArrayList
public class node {
var children: ArrayList<node>
var value: int
var treetotal: int
var calculated: boolean
public construct (v: int) {
children = new ArrayList<node>()
value = v
treetotal = 0
calculated = false
}
public function addChild (n: node) {
this.children.add(n)
}
}
var low: node = null
function findLowest(n: node) {
for (child in n.children) {
if (!child.calculated) {
findLowest(child)
}
n.treetotal += child.treetotal
}
n.treetotal += n.value
if ((low == null) || (n.treetotal < low.treetotal)) {
low = n
}
n.calculated = true
}
var a = new node(3)
var b = new node(2)
var c = new node(1)
var d = new node(-6)
var e = new node(-1)
var f = new node(0)
var g = new node(4)
a.addChild(b)
a.addChild(c)
a.addChild(d)
b.addChild(e)
d.addChild(f)
d.addChild(g)
findLowest(a)
print(low.value)
The initializing code at the bottom builds a tree that looks like:
3
/ | \
2 1 -6
/ / \
-1 0 4
This returns -6, which is the value of the node whose link with its parent should be broken. Obviously if you have nodes with duplicate values you'd need to name the nodes somehow so you can return a unique identifier.
- Mike Lewis November 21, 2012Assuming that there's no sub-ordering within the pairs, so it doesn't matter which of the objects associated with 0 comes first, what we're going to do is instead of having sort[0] = the number of zeroes, we're going to have sort[0] = ArrayList<Object> where that list is all of the objects associated with 0. Adding to the front of a linked list is O(1) so that's nice and fast. Does that make sense?
- Mike Lewis November 21, 2012For that amount of experience, you'll need to do something other than your job history to get noticed. Work on an open source project, release an iOS or Android app, something like that. Basically you'll need something that sets you apart. I didn't start getting recruited by top-end companies until I had a really good run at my previous job and got a bunch of promotions and extra responsibilities very quickly.
- Mike Lewis November 17, 2012Imagine that you need to sort and store the following one-digit numbers:
098723459681623612093851092837561827349878210
now, you could sort all of those and store them in order, or you could just have an int[10] and for each number, add to that cell in the array. so for the first digit, you do sort[0]++, for the second digit you do sort[9]++, etc.
At the end, rather than actually having the list, you have a summary of the list. You know that the list starts with four zeroes, but instead of storing that as 0000, you just have sort[0] = 4. To reproduce the list in sorted order, you would need to loop through the entire list and for each index, print that index a number of times equal to the value in that cell.
part 1:
int[1024] sort;
for (i in list) {
sort[i]++;
}
which is O(n).
part 2: if the pairs are really just integer and object, I would probably just have an array of linked lists of some sort, which will still be O(n) (get the first item in the appropriate list and add the new item before that)
yes. good luck with your interviews!
- Mike Lewis January 27, 2013