Search code examples
algorithmperformancecomputational-geometry

Given two (large) sets of points, how can I efficiently find pairs that are nearest to each other?


I need to solve a computational problem that boils down to searching for reciprocally-nearest pairs of points between two sets. The problem goes something like this:

Given a set of points A and a set of points B in euclidean space, find all pairs (a,b) such that b is the closest point in B to a and a is the closest point in A to b.

The sets A and B are of approximately equal size, and we will call this size N. For my particular problem N is approximately 250,000.

The brute force solution is to compare every point against every other point, which has quadratic time complexity. Is there any more efficient algorithm for doing this?


Solution

  • A data structure I found very useful when I had to do nearest neighbour searches was a k-d-tree. Wikipedia has a nice overview and this is an excellent in-depth discussion of the algorithm if you're implementing your own (although a library may well exist already - you don't mention which language you're using). The most important thing about a kd-tree is that it allows nearest-neghbour searches to be performed in O(log N) time.

    In that way, you could produce two lists of - the members of A and their nearest neighbour in B and the members of B and their nearest neighbour in A - in O(N log N) time. Then, you could compare the lists to see which pairs match. Done naively, that's O(N^2), though you might be able to think of a way to do it faster.

    [edit] You've got me thinking; here's my second thought:

    for(a in A)
        b := nearest(B, a)
        if a = nearest(A, b)
            add (a, b) to results
        end if
    end for
    
    function nearest(X, y)
        return nearest member of set X to point y
    end function
    

    By my reckoning, that's O(N log N).