Consider a system of n nodes n

Microsoft Interview Question for Software Engineer / Developers

2

of 2 votes

19
Answers
Consider a system of n nodes numbered 1 to n. Each node has its id(1 to n) and a value associated with it say val. Now Every node has a send method send(int to , int val) and receive method int receive(int from).

So if node 1 wants to send value , it does like this . send(1,val).

Using these two methods, write a distributed algorithm. Such that when the algorithm finishes, every node in the system knows the sum of the values of all the nodes in the system.
- sowmya February 20, 2013 in United States | Report Duplicate | Flag | PURGE
Microsoft Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 6 vote

Let the nodes form a binary tree. Every node has 0 or 1 parents and between 0 and 2 children. Nodes with children wait for them to report their values and pass a partial sum up to their parents, then wait for the parents to report the eventual sums. For parallel nodes this runs in O(logN) time. The synchronous simulation below runs in linear time.

import random

class Node:
    def __init__(self, n, num_nodes, val, send, report_final_result):
        self.send = send
        self.report_final_result = report_final_result
        self.children = []
        self.sum = val
        if 2*n <= num_nodes:
            self.children.append(2*n)
        if 2*n+1 <= num_nodes:
            self.children.append(2*n+1)
        self.parent = n / 2
        self.values_to_wait_for = len(self.children)
        self.child_values = []
    def start(self):
        if self.values_to_wait_for == 0:
            if self.parent:
                self.send(self.parent, self.sum)
            else:
                self.report_final_result(self.sum)
    def receive(self, val):
        if self.values_to_wait_for > 0:
            self._handle_child_val(val)
        else:
            self._handle_parent_val(val)
    def _handle_child_val(self, val):
        self.sum += val
        self.values_to_wait_for -= 1
        if self.values_to_wait_for == 0:
            if self.parent:
                self.send(self.parent, self.sum)
            else:
                # we're the root
                self.report_final_result(self.sum)
                for child in self.children:
                    self.send(child, self.sum)
    def _handle_parent_val(self, val):
        self.report_final_result(val)
        for child in self.children:
            self.send(child, val)

def simulate(num_nodes):
    nodes = []
    pending_nodes = set()
    sum = 0
    def make_dispatch(sender):
        def f(n, val):
            # connect send to recipient's receive() method
            print 'sender %d sent %d to %d' % (sender, val, n)
            nodes[n-1].receive(val)
        return f
    def make_report_final_result(sender):
        def f(val):
            print sender, val
            assert val == sum
            pending_nodes.remove(sender)
        return f
    for i in range(num_nodes):
        node_val = random.randint(0, 100)
        sum += node_val
        node = Node(i+1, num_nodes, node_val,
                make_dispatch(i+1),
                make_report_final_result(i+1)
        )
        nodes.append(node)
        pending_nodes.add(i+1)
    for i in range(num_nodes):
        nodes[i].start()
    assert 0 == len(pending_nodes)

simulate(2000)

- showell30@yahoo.com February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think you didn't understand the question. Every node should know the sum of all the values.

- sowmya February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

He said "wait for the parents to report the eventual sums", the final value will be flushed top-to-bottom so all the nodes will have the sum of all the values.

- Chih.Chiu.19 February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Just for my understanding, is there the notion that the Nodes would be on separate machines perhaps and some of these calls would be accomplished through something like CORBA, RPC, etc?

- vmayer99 February 21, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think the Python code is awesome.

The only problem is that why you choose binary tree?
If a node could have a three children, would this be faster?

- Kevin February 26, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Kevin, My solution only makes sense for a distributed computation where the actual computation is expensive. Assume for the sake of the exercise that network latency is negligible compared to the cost of the computation, and that summing two numbers takes a full second of computation time.

Even for a relatively expensive computation, doing a binary tree is way overkill for small values of N. For a small number of nodes, you're best off just having all nodes send their value to a single master node and let the master node do its thing.

But let's assume N=12, and let's assume a really fast network. With a binary tree, the nodes will take this many seconds before their computation is finished

12: 0
11: 0
10: 0
9: 0
8: 0
7: 0
6: 1 (only adds 6 and 12 values)
5: 2
4: 2
3: 3 (has to wait for 6 before adding 6 to 3+7)
2: 4 (has to wait on 4 and 5)
1: 5( gets 3 at t3, adds 3 to itself, then gets 2 at t4)

That makes for a total of five seconds.

Now let's set it up so that nodes have three children:

1 is parent of 3, 4, 5
2 is parent of 6, 7, 8
3 is parent of 9, 10, 11
4 is parent of 12

Nodes 2, 3, and 4 start their computation immediately, but they have three sums to perform (themself + first child + second child + third), so they're busy a full three second. So they all finish at t=3 and send their value immediately to node 1. When node 1 gets those three values, it takes node 1 another three seconds to compute its sum. So the total running time is six seconds.

By giving the lower-id nodes more children, we've made them the bottleneck, undermining the power of a distributed system.

- showell30@yahoo.com February 27, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Oops, my description of the three-way scenario is a little off, but hopefully you get the idea. When each parent node has to do more work, the cost of the later computations goes up. For large numbers of N, you are effectively comparing 2log2(N) vs. 3log3(N). Even though log3(N) is always less than log2(N) for N > 1, it's greater than 2/3log2(N) for any large value of N.

- showell30@yahoo.com February 27, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Actually, 2 children vs. 3 children is gonna be very close, and I think a 3-child scheme may even be better. You can estimate the total cost with code like this:

def compute_time2(m, n):
    if m*2 > n:
        return 0
    else:
        # mostly accurate
        return compute_time2(m*2, n) + 2

def compute_time3(m, n):
    if m*3 > n:
        return 0 # no children
    else:
        return compute_time3(m*3, n) + 3

def compute_time15(m, n):
    if m*15 > n:
        return 0 # no children
    else:
        return compute_time15(m*15, n) + 15

print compute_time2(1, 80000)
print compute_time3(1, 80000)
print compute_time15(1, 80000)

- showell30@yahoo.com February 27, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@showell30
Hi,
I think 3 children is better than 2 children. Generally speaking, if we have N nodes, a master with N-1 slaves are maybe the best solutions.

For instance, we have N=12 nodes;
binary tree:
1 -- 2 -- 4 -- 8
-- 9
-- 5 -- 10
-- 11
-- 3 -- 6 -- 12
-- 7

Triple tree:
1 -- 2 -- 5
-- 6
-- 7
-- 3 -- 8
-- 9
-- 10
-- 4 -- 11
-- 12

We assume computation time for adding two number is t
Ignore latency.
for binary tree:
T(4) = T(5) = 2 T(6) = 1 T(7) = 0
T(2) = 4 T(3) = 3
T(1) = 6

for triple tree:
T(2) = T(3) = 3
T(4) = 2
T(1) = 6

So the total is the same. Generally speaking, the time should be similar.
Yet the assumption is that we ignore transfer latency.
If we consider latency,triple tree will be better. Because much more data could be transferred concurrently.

- Kevin February 27, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

Just a thought:
If each node knows its own node ID, and how many nodes are in the system, you might handle the problem this way:
Imagine the nodes arranged in a circle.
Part A: Node 1 starts by sending its val to node 2. Node 2 sends the val it receives, plus its own val, to node 3. Node n sends the total of vals to node 1.
Part B: Node 1 receives a message from node n, which contains the sum of the vals. It passes this sum to node 2, which resends the message around the circle.
Each node (except node 1) knows that the *second* message it receives will be the total sum. Node 1 knows that the *first* message it receives is the total sum, and the *second* message it receives means that all nodes have been informed of the sum.
So each node has essentially the same code, and the algorithm is linear; it sends twice as many messages as there are nodes.

- jazcap53 February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

The only problem with this solution is that you don't really take advantage of the parallelism of a distributed system. You want to come up with some divide and conquer scheme that allows multiple nodes to be computing partial sums simultaneously. See the Python binary tree solution, for example.

- showell30@yahoo.com February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You're right. And very nice Python solution.

- jazcap53 February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

we can do it like many to one mapping.in a loop we get value from each node and add that value in a specific node.at the end of loop we have the sum of all the node in 1 single node.then we can flood this value again to the other node..its like many to 1 mapping.is it right ??

- go4gold February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

It's a Observer Design pattern. (i.e. Publisher- subscriber pattern) .

- skt February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

concept is widely used in event handling / polling / push data model .

- skt February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

it is more suitable to be a Mediator pattern ....

- Omribitan February 22, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I think we can arrange all nodes in a circular single linked list. While traversing, in send(id,val), each time add previous value to val(just like we calculate sum for array of numbers) and in next node retrieve it using receive method. at the end val contains total sum. Now again reiterate list. this time val contains only total sum that occurred at last node in list. In this way all nodes can know sum of others. since traversing a list take O(n), time complexity is O(n)+O(n)=O(n). Any suggestions for improvement is appreciated.

- ajit February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

As this is a parallel algorithm, we often assume the processors are multiples of 2

id = #my id - 1 // using id as 0 based
val = #myval
totalrounds = log2(n)
in parallel:
	while round = 1 to totalrounds:
		partnernode = id xor round // id of partner in 0 index
		send(partnernode + 1, val)
		val = val + receive(partnernode+1)

Done!

- Akshay February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#define TOTAL_NODE N  

int  distributeSUM()
  {
        
     int i        = MyId;
     int totalSum = MyVal;
     int temp=1;

      for (;temp < MyId; temp++)
        totalSum += receive(temp);

      send(MyId, MyVal)

      for (temp++; temp <= N; temp++)
        totalSum += receive(temp);
             
      return(totalSum);      


  }

- ashutosh141 February 23, 2013 | Flag Reply

CareerCup

Microsoft Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews