Microsoft Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: In-Person
The problem is very similar to matrix chain multiplication, the only difference is the calculation of new comparison from previous ones.
Now look at it very carefully, it is clear that we need to use DP, and in DP also we need to calculate(remember) a matrix. Now because the not a be placed at any position from i to j (if we are calculating MAT[ i ][ j ]), so the complexity will go to O(N*N*N)
Now Here is the algorithm :
M[i][j] = min(p[k] + M[i][k-1] + M[k+1][j] + sum(M[r][r] for r=i:j) for k=i:j), please consider boundary conditions,
we set M[i][i] = p[i], not that firstly we sort the array of number and maintain their corresponding probabilities also(sort can be in any order).
For example like above construct a upper diagonal matrix, here each entry actually store the comparisons only. Now if you want to construct the tree, you need another matrix(B od same size which store node data), which tells who to be chosen as parent node at each place, so M[0][N-1] will give you cost and B[0][N-1] will give you root node.
Please let me know if am not clear..
in huffman tree, the element with higher frequency or probability can be reached in short time than element with low frequency or probability..
to have details of huffman encoding.. see CORMEN's Introduction to Algorithm
you are right..
1. sort the elements according to the probability => preorder of BST
2. build BST
so that Root element is the element with highest probability..
for the given case..by doing like this only drawback , 30 is searched faster than 16.
cobra: that's not correct. Consider a tree whose inorder traversal is 1, 2, 3, 4, 5, 6, .., N and whose probabilities are, for instance, .01, .02, .03, .035, .... Strictly ascending probabilities with respect to the in-order traversal.Your idea would place N as the root, N-1 as its left child, N-2 as the left child of that, ... Your tree would be a linked list. This can be worse in the expectation than a more balanced tree even if the root of the balanced tree isn't the node that has the highest probability.
i guess cobra's solution is correct. its just about to keep the probabilities as the keys to BST and info as satellite data.
It's not correct, for the reason I mentioned. Take a dataset like the one I mentioned. Run cobra's algorithm. Then run the dynamic programming algorithm described by another response. Then observe how the dynamic programming solution yields a better expected time. This shows that cobra's algorithm isn't optimal.
Use a splay tree, and insert the elements one-by-one in increasing order of probability. In a splay tree (which is a special type of BST), the most recently inserted/accessed elements occur at the top of the tree.
Note; I'm not sure this solution is correct.
You haven't given a good rationale as to why it would be correct. Just because the most frequent elements are near the top, doesn't mean they're arranged in the most optimal way, or if they are, you should give a proof or at least an argument. I think that, generally speaking, when you post a solution to a non-trivial problem, the burden is on you to show that the solution is correct.
Try comparing your results with the results of the DP approach, whose optimality is easy to see and prove. You'll probably find cases where your trees are suboptimal.
I feel the splay tree solution mentioned here is a good attempt ... but if nodes are considered in increasing order of probability for constructing the BST, then it will lead to a linked list like structure ... so this will not be an efficient BST ... Let me know if I am missing anything here
It appears to me that it might be simply done by using sorting the elements on the basis of their probabilities, and then putting them in the tree in a level-order fashion, i.e., if the elements after sorting are e1, e2, e3, e4, e5, e6 and so on, then e1 will be root, e2, e3 would be its children, e4, e5 would be e2's children and so on.
Here's my logic for that:
The expected time is Sum{P(i)*T(i)} for all i, right? and we need to minimize it. So, simple enough, i say. Put the Highest P(i)'s with the lowest T(i)'s... as many as you can. the highest P(i) would be at root, with t=1 (lowest) , followed by e2 and e3, the next highests getting the next lowest time (t=2).
Yes this problem can be solved by using dynamic programming in O(n3) time
- Anonymous August 02, 2012C(i,j) = min { C(i,r-1) + p(r) + C(r+1,j) + sum_{k=i}^{r-1} p(k) + sum_{k=r+1}^{j} p(k) } i <=j
i<= r <=j
= 0 if i = j-1 here we have no occurrence of non existing element node , i.e we only search nodes which are present
To understand this problem more clearly you can watch this video
youtube.com/watch?v=DEOebw3vmXs