Microsoft Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
I think you didn't understand the question. Every node should know the sum of all the values.
He said "wait for the parents to report the eventual sums", the final value will be flushed top-to-bottom so all the nodes will have the sum of all the values.
Just for my understanding, is there the notion that the Nodes would be on separate machines perhaps and some of these calls would be accomplished through something like CORBA, RPC, etc?
I think the Python code is awesome.
The only problem is that why you choose binary tree?
If a node could have a three children, would this be faster?
@Kevin, My solution only makes sense for a distributed computation where the actual computation is expensive. Assume for the sake of the exercise that network latency is negligible compared to the cost of the computation, and that summing two numbers takes a full second of computation time.
Even for a relatively expensive computation, doing a binary tree is way overkill for small values of N. For a small number of nodes, you're best off just having all nodes send their value to a single master node and let the master node do its thing.
But let's assume N=12, and let's assume a really fast network. With a binary tree, the nodes will take this many seconds before their computation is finished
12: 0
11: 0
10: 0
9: 0
8: 0
7: 0
6: 1 (only adds 6 and 12 values)
5: 2
4: 2
3: 3 (has to wait for 6 before adding 6 to 3+7)
2: 4 (has to wait on 4 and 5)
1: 5( gets 3 at t3, adds 3 to itself, then gets 2 at t4)
That makes for a total of five seconds.
Now let's set it up so that nodes have three children:
1 is parent of 3, 4, 5
2 is parent of 6, 7, 8
3 is parent of 9, 10, 11
4 is parent of 12
Nodes 2, 3, and 4 start their computation immediately, but they have three sums to perform (themself + first child + second child + third), so they're busy a full three second. So they all finish at t=3 and send their value immediately to node 1. When node 1 gets those three values, it takes node 1 another three seconds to compute its sum. So the total running time is six seconds.
By giving the lower-id nodes more children, we've made them the bottleneck, undermining the power of a distributed system.
Oops, my description of the three-way scenario is a little off, but hopefully you get the idea. When each parent node has to do more work, the cost of the later computations goes up. For large numbers of N, you are effectively comparing 2log2(N) vs. 3log3(N). Even though log3(N) is always less than log2(N) for N > 1, it's greater than 2/3log2(N) for any large value of N.
Actually, 2 children vs. 3 children is gonna be very close, and I think a 3-child scheme may even be better. You can estimate the total cost with code like this:
def compute_time2(m, n):
if m*2 > n:
return 0
else:
# mostly accurate
return compute_time2(m*2, n) + 2
def compute_time3(m, n):
if m*3 > n:
return 0 # no children
else:
return compute_time3(m*3, n) + 3
def compute_time15(m, n):
if m*15 > n:
return 0 # no children
else:
return compute_time15(m*15, n) + 15
print compute_time2(1, 80000)
print compute_time3(1, 80000)
print compute_time15(1, 80000)
@showell30
Hi,
I think 3 children is better than 2 children. Generally speaking, if we have N nodes, a master with N-1 slaves are maybe the best solutions.
For instance, we have N=12 nodes;
binary tree:
1 -- 2 -- 4 -- 8
-- 9
-- 5 -- 10
-- 11
-- 3 -- 6 -- 12
-- 7
Triple tree:
1 -- 2 -- 5
-- 6
-- 7
-- 3 -- 8
-- 9
-- 10
-- 4 -- 11
-- 12
We assume computation time for adding two number is t
Ignore latency.
for binary tree:
T(4) = T(5) = 2 T(6) = 1 T(7) = 0
T(2) = 4 T(3) = 3
T(1) = 6
for triple tree:
T(2) = T(3) = 3
T(4) = 2
T(1) = 6
So the total is the same. Generally speaking, the time should be similar.
Yet the assumption is that we ignore transfer latency.
If we consider latency,triple tree will be better. Because much more data could be transferred concurrently.
Just a thought:
If each node knows its own node ID, and how many nodes are in the system, you might handle the problem this way:
Imagine the nodes arranged in a circle.
Part A: Node 1 starts by sending its val to node 2. Node 2 sends the val it receives, plus its own val, to node 3. Node n sends the total of vals to node 1.
Part B: Node 1 receives a message from node n, which contains the sum of the vals. It passes this sum to node 2, which resends the message around the circle.
Each node (except node 1) knows that the *second* message it receives will be the total sum. Node 1 knows that the *first* message it receives is the total sum, and the *second* message it receives means that all nodes have been informed of the sum.
So each node has essentially the same code, and the algorithm is linear; it sends twice as many messages as there are nodes.
The only problem with this solution is that you don't really take advantage of the parallelism of a distributed system. You want to come up with some divide and conquer scheme that allows multiple nodes to be computing partial sums simultaneously. See the Python binary tree solution, for example.
I think we can arrange all nodes in a circular single linked list. While traversing, in send(id,val), each time add previous value to val(just like we calculate sum for array of numbers) and in next node retrieve it using receive method. at the end val contains total sum. Now again reiterate list. this time val contains only total sum that occurred at last node in list. In this way all nodes can know sum of others. since traversing a list take O(n), time complexity is O(n)+O(n)=O(n). Any suggestions for improvement is appreciated.
As this is a parallel algorithm, we often assume the processors are multiples of 2
id = #my id - 1 // using id as 0 based
val = #myval
totalrounds = log2(n)
in parallel:
while round = 1 to totalrounds:
partnernode = id xor round // id of partner in 0 index
send(partnernode + 1, val)
val = val + receive(partnernode+1)
Done!
Let the nodes form a binary tree. Every node has 0 or 1 parents and between 0 and 2 children. Nodes with children wait for them to report their values and pass a partial sum up to their parents, then wait for the parents to report the eventual sums. For parallel nodes this runs in O(logN) time. The synchronous simulation below runs in linear time.
- showell30@yahoo.com February 20, 2013