## Google Interview Question for Software Engineers

Country: UK
Interview Type: In-Person

Comment hidden because of low score. Click to expand.
1
of 1 vote

Part 1: The arrays need to be sorted first and then the function can be applied. This can be accomplished by splitting the sorting between the machines, then merging the results and then exchanging the parts of the arrays for application of the function (a form of a map-reduce algorithm). Heap sort would be particularly useful as it produces consecutive (and sorted) elements during sorting which could already be used for application of the function.

Part 2: Either run a small subset first to get an idea if it is linear distribution and then divide statically according to that or try to adapt during processing depending on the load of the machines.

Part 3: Use master election algorithms (gossip algorithms, etc.).

Comment hidden because of low score. Click to expand.
0

From what I understand, by splitting the sorting between machines, are you sorting the arrays partially? I find it confusing when you're using map reduce and heap sort.

Thanks.

Comment hidden because of low score. Click to expand.
0

why do you assume they would need to be sorted, what if they're objects and we can't sort them in any meaningful way?

Comment hidden because of low score. Click to expand.
1
of 1 vote

1. I dont think we need to sort both arrays. (==nlog(n))
For the first one, hash every value and point it into the array. (this is log(n) as you run on the entire array
then, run on the second array, ad for each element , look if it exists on the hash, if it does, pull the element and operate F on it.

[There is an assumption that the values are fixed and it is not a stream of data]
2. balancing is relatively easy,hash the first array [or split the array, hash every part on different machines and combine the hashmaps --> which is a basic MapReduce).
then you can splot the second array and operate each machine on different part.
3. Per the first answer above, find a different master (Like done in Hadoop)

Comment hidden because of low score. Click to expand.
0
of 0 vote

From what I understand, by splitting the sorting between machines, are you sorting the arrays partially? I find it confusing when you're using map reduce and heap sort.

Thanks.

Comment hidden because of low score. Click to expand.
0

I believe that by doing quick sort it feasible: split in the middle and send each part to a different machine and so on

Comment hidden because of low score. Click to expand.
0
of 0 vote

I refer to Foo as A, Bar as B.

1. Assuming every element in A has exactly 1 corresponding element in B, sort both A and B then apply F(A[i], B[i]) for i = 0 to length of A or B.

We can do this with just machine 1 and 2. However we really should also utilize machine 3. One simple algorithm is to partition (like in quicksort) the arrays A and B into 24 parts (this is a power of 2 evenly divided by 3). Then put 8 parts of each array onto each machine. Each machine can then sort each sub-partition and run function F on corresponding values from A and B like mentioned above.

2. I assume by scale up, this means you end up with more spare machines. Same idea as above, we want to place the same number of elements onto each machine via partitioning the arrays such that each machine gets the same number of elements to process, then letting each machine handle their load at their own pace.The nice thing about the recursive partitioning is that it doesn't have to be done by a single machine. Once you partition an array once, the 2 halves now don't care about each other and can be treated independently on different machines. This means, once machine 1 breaks A into A[0...i] and A[i+1, A.length-1], A can offload A[i+1, A.length-1] to be processed by another machine.

3. Need to somehow distribute the control. In the extreme case, every machine could know what every other machine is working on and report process to each other. Ideally, you would have a balance of masters and slaves where only the master needs to coordinate slaves and slaves need to communicate with the master, but still know about other slaves in the same cluster. If the master is non-responsive, a new master needs to be created from another slave in the cluster, that slave needs to figure out what work was done and what work remains, and continue from there.

Comment hidden because of low score. Click to expand.
0

I like that you split and count at the same machine without a need to merge the result. However it works only if "matching" always means that A[i], B[i] are the ones "matching".

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.