Directi Interview Question
InternsCountry: India
Interview Type: Phone Interview
Here's my analysis.
For complexity less than n^2, it means that we don't check every pair of points which means there needs to be a way to order along both x and y simultaneously. This is not possible because for each point, Xi and Yi are independently chosen. So sorting on one axis does not guarantee any particular order on the other axis and sorting on both axis is not possible.
Makes sense? If not, whats wrong?
O(nlogn) to just count should be possible.
Sort the points considering the x_i, wlog assuming it is (x_1, y_1) ... , (x_n, y_n) with x_1 <= x_2 ... <= x_n.
For the moment assume the x_i are distinct (we should be able to modify to cater to that case).
Now we need a datastructure which can support:
Insert (x);
GetCount(y): Number of previously inserted numbers which are less than (or greater than) y.
This can be done using a balanced binary tree with some additional fields.
"This can be done using a balanced binary tree with some additional fields."
Very vague answer. Without conditions and details of the tree, this does not give a solution.
@axecapone: It is not vague. Please read the red-black tree bonus chapters of CLR and come back to this.
I am too lazy to type in the details, a slightly determined person with some basic data-structure knowledge should be able to figure this out.
(Of course, I might be mistaken completely, in which I apologize...)
@Anonymous
Firstly what is the tree made of? X values, Y values, f(x,y)?Without all these details, I don't see how red black trees solve this problem.
The fundamental question is "Can we atleast find the number of pairs in less than O(n^2)?" May be its not possible in which case no datastructure can help. Without an answer to this question and without sufficient details on your data structure, you can understand why I am saying your answer is vague.
Anonymous is right. That is a valid approach that correctly counts the number of "dominating" pairs. I do think it was a bit vague and unclear though, so I will describe the algorithm in more detail.
This algorithm counts the pairs in O(NlogN). This algorithm can also be used to print out all dominating pairs in O(NlogN + P) where P is the number of such pairs. So if the number of dominating pairs is subquadratic, we will achieve a subquadratic algorithm. If not, then of course we must require quadratic time.
We sort the inputs by ascending x_i. For elements whose x_i values are identical, sort them among themselves by descending y_i (this will properly handle duplicate x_i as I will show). Make a blank order statistic tree. This is a tree that lets us efficiently, in logN time, ask how many nodes are less than a certain value. It also allows logN insert. Now, start from the beginning of the sorted array, and for every value, ask the order statistic tree how many values seen before had a smaller y. We dominate all such values since by construction we have a greater x than all the pairs in the tree so far. In fact we are checking only against pairs with smaller x. Smaller or equal to, that is. But pairs with equal x won't matter because we sorted the y in descending order for equal x, and so the tree won't contain any values with equal x and smaller y. If the criterion had been >= x_i to dominate, we would have sorted y by ascending order within values that have the same x, thus counting all the pairs with equal x. After doing the lookup and adding to a running total (and possibly printing all relevant elements by in-order traversal), we will add the current element to the tree.
We will end up doing O(N) tree lookups and O(N) tree inserts, for a total O(NlogN) cost. The sorting we did initially on the array also cost O(NlogN), so the grand total is O(NlogN). If you printed the elements, in-order traversals cost (logN + nodes traversed), for a grand total of O(NlogN + P) for this entire algorithm.
I think it can be less than O(N*N).Because the order defined in this problem is transitivity, which means if A<B and B<C ,then we get A<C.That is to say when we have compared A and B ,B and C, and get the result A<B and B<C ,we needn't compare A and C at all. But now, I don't know how to use it effectively.
I think this problem should be solved easily using a quadtree since a point is 'dominant' with respect to another point if it is in the first quadrant with respect to that other point. This should give a fairly simple solution, with average case log(n) (to the base 4) search time once the point is found we can just extract all points that are in first quadrant and third quadrant with respect to it and form appropriate pairs.
Make a binary tree with a node having an extra field for counting number of elements
in it's right sub tree.
Now insertion will be like this:
Say root is (Xr,Yr) and element to be inserted is x,y
if x>Xr and y>yr , (x,y) will go to right sub tree and count for root node will be incremented
In all other cases coming node will be inserted in left subtree
Traverse the tree too get the sum of count
this can be done using Binary indexed tree with a little modification...the modification is that you have to update the BIT for all values of x first...then check one by one for each value of x from position 1 of the bit say it current bit...now calculate the number of indices of BIT less than the current bit which have indices set as 1. but while including those check whether the y coordinate of the current bit is bigger than the y coordinate of other bit at a lower in index position and if its true then only increase the count by 1. it will take O(nlogn) ..
we can first sort the list of co ordinates in descending order of x co ordinate --. (this takes O(nlogn). now start comparing the elements from the highest x cordinate
Sort the points according to X. Calculate the number of inversions, comparision with respect to y. nC2 - (number of inversions) is your answer.
- catchmrbharath November 28, 2012O(nlogn) for sorting, and O(nlogn) for calculating inversions.