Facebook Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
link: theory.stanford.edu/~megiddo/pdf/leastdis.pdf
I think the first proof in this paper suggests a O(n^3) algorithm by showing an optimal line lies along at least two points. Moreover, they give a more efficient algorithm. May be worth the read for those stuck (like me).
lol cool
we were on our way to proving that paper (for euclidean distances) right in this careercup thread
including the part where if notanexpert's conjecture were true (which he was onto proving), we could use square on distances and eliminate infinite solutions that arise from using distance directly
then we could work with the simplified objective function which has abs^2 which makes the functions nonsingular after differentiation
nothing too drastic in that paper
+1000 to @justin @notanexpert @den @nothing_special
Answer please? I will have someone look into this.
Assuming 2D plane of doubles (approximating real numbers), you can derive a formula on paper ( 2 variable minimization problem , find partials set them to zero and solve).
Curious who asked who asked you this.
it can be solved in O(n^2), for each pair of points form a line, {as if a point is on line perpendicular distance is zero (minimum)}, calculate perpendicular distance from each point not in pair to the current line O(constant) Distance_from_a_point_to_a_line in wiki,
sum it and compare it to min sum found so far, for the minimum sum print out line found from the two points
I wonder too.
What he said reminds me of Sylvester Gallai theorem, which isn't relevant here.
good question, by induction? my mathematical proof is below par, two points, minimum perpendicular distance is the line between two points, three points any line joining two points in the given set will be minimum. I could not find any counter example in my solution, But I guess they are expecting a concrete proof on paper
Let { (x1,y1), ... , x_n, y_n) } be your points (don't fill numbers in).
Now assume general form of the solution line ax + by = c, where a,b,c are parameters.
Derive formula for perp. distance between the above line and anygeneral point (x_i,y_i).
Now create an overall objective function
Obj(a,b,c) = SUM{from i=1, to n} ( perp_dist( (x_i, y_i) , ax+by=c ) )
Now find dO/da, dO/db, dO/dc , set them to zero.
We should be able to get a,b,c as functions of x_i, y_i with a bunch of sigma notation going from i=1 to n.
The formula should be O(n) , that is with sigma notations going from i=1 to n with constant work inside each sigma, and be totally independent of what the actual points are (we do proof above with general variables x_i, y_i).
Now we hard code the formula in our function and use it.
We do a linear pass on the array of points and accumulate whatever each "sigma" needs in the end.
{Missing is proof that the critical point we get above by setting the three partials to 0 is actual a min., and not a saddle or max., but we work on faith here}
Now @notanexpert, from the result of finding the formulae for a,b,c we should be able to "see" by inspection if your conjecture is true or has possible counterexamples.
We should minimize function F(a, b, c) = sum{i = 1, ..., n} (|A*x[i]+B*y[i]+C|) / (A^2 + B^2)^0.5
How to get dF/da, ect. for function which contains abs?
@den nice. I just returned from 10+ min. of frustrations on this pain in the neck point too (which comes pretty early on in proof).
This can only mean we have to resort to abs(x) = sqrt(x^2) to make it differentiable "in some sense", but then you get a singularities to deal with later in the proof (because derivative will be |x|/x * dx/dvariable, which is not even defined at x=0) . It can be done but then we have to handle all these singularity points as possible critical points....
.... another point, pun intended...
The question seems funky. His/her professor probably meant "minimize square on the perpendicular distances." Because think about it (including @notanexpert's conjecture), if the conjecture is not true, then we can get away with "translating" the solution line in the direction of the perpendiculars (which will all be in same direction for any solution :P) and get equally minimum solutions - infinitely many of them. So it all seems fishy....
This does suggest that IF this question is a well posed problem, that notanexpert's conjecture would need to be true to force unique solutions.
Else, I think we should be minimizing squares on perp. distance to prevent this "sliding to create infinite solution lines." This would also make the math way easier.
What do you think @den?
I'm curious to see a more detailed version of proof of notatexpert's conjecture also.
We would need more than the conjecture to be not true (because his conj. is for 2 points). If a solution has no points on it, then we can wiggle to get infinite solutions.
Suppose "one" solution has k points on one side and k points on the other side of the line, wiggling the line a little in direction of its perpendicular. This will take away k*(wiggle) from one side of points and add k*(wiggle) and the new line should have the same total perp. distance.
I'm pretty sure this is an exhaustive search for a special case, or the poster misunderstood the original question.
Yes this is a workable solution. First a minor correction would be O(n^3) no O(n^2) since you'll have to compute each point for a given line.
Proof of "at least two points are on the line L":
Suppose a solution with no point or only one point exists, choose that point( or any point if there is no point on the line) as the origin, marked as 0.
Then we can rotate the line L against 0 by a small angle t, when t is small enough such that the line will not hit any other points. For a given point (x,y) outside L, suppose y > 0, denote the distance from (x,y) to L(t) is f(t), then it is easy to compute that f'(0)=-x, f''(0)=-y.
Similarly if y < 0, we'll get f'(0)=x, f''(0)=y.
In either case, we'll have f''(0)<0.
Then let F(t) be the sum of all f's. We have F''(0)=\sum f''(0) < 0.
That is, it is impossible for 0 to be a minimum extreme: either it's not a critical point, or it's a maximum extreme. Q.E.D.
Sorry I didn't login. The key point is: when t is small enough, L won't hit any other points. This will ensure that the function f(t) is smooth. Otherwise there will be a "jump" when the line swipes other points.
We are trying to minimize the sum of the distances from the points to the line. So the objective function is:
$$\min \sum_{i = 1}^n (axi + byi - c)^2$$
We can expand it to be:
$$\min \sum_{i = 1}^n (xi^2a^2 + yi^2b^2 + c^2 + 2xiyi ab - 2xi ac - 2yi bc)$$
Let $$X = \sum_{i = 1}^n xi^2, Y = \sum_{i = 1}^n yi^2, C = \sum_{i = 1}^n 2xiyi, B = \sum_{i = 1}^n -2xi, A = \sum_{i = 1}^n -2yi $$
Then the objective function is:
$$\min Xa^2 + Yb^2 + c^2 + C ab + B ac + A bc$$
Therefore, we only need to solve the following equations:
$$
2X a + C b + B c = 0
2Y b + C a + A c = 0
2 + B a + A b = 0
$$
Therefore, the algorithm is to compute the X, Y, A, B, C first, which is O(n). Then solve the linear equations, which is O(1).
The output of the algorithm is (a, b, c)
the distance from a point (x1,y1) to a line: ax+by+c=0 is
| ax1+by1+c | / sqrt(a^2+b^2)
Please note that the solution above assumes the linear function is ax+by = c, therefore the distance function is |ax1 + by1 -c| / sort(a^2 + b^2).
Since sqrt(a^2+b^2) will be appear for each (xi, yi), we do not need to calculate it when doing comparisons.
Therefore, the objective function is actually to minimize (axi+byi+c)^2 assuming the linear function in the form ax + by = c.
import java.awt.Point;
import java.util.ArrayList;
/**
*
* @author Marcelo Filho
* @email marcelolfilho@gmail.com
* @facebook facebook.com/idemax.green
*
*/
public class LessDistBetNPoints {
/**
* @param args
*/
public static void main( final String[] args ) {
final ArrayList<Point> nPoints = new ArrayList<Point>();
nPoints.add( new Point( 1, 2 ) );
nPoints.add( new Point( 3, 1 ) );
nPoints.add( new Point( 4, 1 ) );
nPoints.add( new Point( 6, 2 ) );
nPoints.add( new Point( 6, 4 ) );
nPoints.add( new Point( 4, 3 ) );
nPoints.add( new Point( 2, 4 ) );
nPoints.add( new Point( 4, 6 ) );
nPoints.add( new Point( 7, 6 ) );
nPoints.add( new Point( 7, 1 ) );
LessDistBetNPoints.findOutALine( nPoints );
}
private static void findOutALine( final ArrayList<Point> nPoints ) {
final int nPointsSize = nPoints.size();
final ArrayList<Point> lessWalkWay = new ArrayList<Point>();
Point lastPoint, currentPoint, nextCandidate;
int currentNPointsSize;
double distBetPoints, lessDist, minDist = 0;
lessWalkWay.add( nPoints.remove( 0 ) );
while ( nPoints.size() > 0 ) {
lastPoint = lessWalkWay.get( lessWalkWay.size() - 1 );
currentNPointsSize = nPoints.size();
lessDist = Double.MAX_VALUE;
nextCandidate = null;
for ( int i = 0; i < currentNPointsSize; i++ ) {
currentPoint = nPoints.get( i );
distBetPoints = Math.abs( Math.sqrt( Math.pow( ( lastPoint.x - currentPoint.x ), 2 ) + Math.pow( ( lastPoint.y - currentPoint.y ), 2 ) ) );
if ( distBetPoints < lessDist ) {
lessDist = distBetPoints;
nextCandidate = currentPoint;
}
}
if ( nextCandidate != null ) {
lessWalkWay.add( nPoints.remove( nPoints.indexOf( nextCandidate ) ) );
minDist += lessDist;
}
}
final StringBuilder strB = new StringBuilder();
for ( int i = 0; i < nPointsSize; i++ ) {
currentPoint = lessWalkWay.get( i );
strB.append( "{" );
strB.append( currentPoint.x );
strB.append( "," );
strB.append( currentPoint.y );
strB.append( "}" );
strB.append( "," );
}
strB.deleteCharAt( strB.length() - 1 );
System.out.println( "Minimum route: " + strB.toString() );
System.out.println( "Minimum route: " + minDist );
// Minimum route: {1,2},{3,1},{4,1},{4,3},{6,2},{7,1},{6,4},{7,6},{4,6},{2,4}
// Minimum route: 20.113122279787035
}
}
This problem can be solved by total least squares method.
If in 2 plane, its equation can be computed exactly.
// y = a0x + a1
std::pair<double, double> Line(std::vector<std::pair<double, double> > & v) {
double s_x_y = 0;
double s_x = 0;
double s_y = 0;
double s_x2 = 0;
for (int i = 0; i < v.size(); i++) {
s_x_y += v[i].first * v[i].second;
s_x += v[i].first;
s_y += v[i].second;
s_x2 += v[i].first * v[i].first;
}
double t1 = s_y * s_x - v.size() * s_x_y;
double t2 = s_x * s_x - v.size() * s_x2;
double a0 = t2 == 0 ? 0 : t1 / t2;
t1 = s_x_y - a0 * s_x2;
t2 = s_x;
double a1 = t2 == 0 ? 0 : t1 / t2;
return std::make_pair(a0, a1);
}
FYI, this is regression analysis using orthogonals. Seems to be a rather difficult question to be answered over the phone if you don't know the math behind it.
- nothing special here November 02, 2013