algorithmdata-structurescomputational-geometry

# How to find the nearest line segment to a specific point more efficently?

This is a problem I came across frequently and I'm searching a more effective way to solve it. Take a look at this pics:

Let's say you want to find the shortest distance from the red point to a line segment an. Assume you only know the start/end point (x,y) of the segments and the point. Now this can be done in O(n), where n are the line segments, by checking every distance from the point to a line segment. This is IMO not effective, because in the worst case there have to be n-1 distance checks till the right one is found.

This can be a real performance issue for n = 1000 f.e. (which is a likely number), especially if the distance calculation isn't just done in the euclidean space by the Pythagorean theorem but for example by a geodesic method like the haversine formula or Vincenty's.

This is a general problem in different situations:

• Is the point inside a radius of the vertices?
• Which set of vertices is nearest to the point?
• Is the point surrounded by line segments?

To answer these questions, the only approach I know is O(n). Now I would like to know if there is a data structure or a different strategy to solve these problems more efficiently?

To make it short: I'm searching a way, where the line segments / vertices could be "filtered" somehow to get a set of potential candidates before I start my distance calculations. Something to reduce the complexity to O(m) where m < n.

Solution

• Probably not an acceptable answer, but too long for a comment: The most appropriate answer here depends on details that you did not state in the question.

If you only want to perform this test once, then there will be no way avoid a linear search. However, if you have a fixed set of lines (or a set of lines that does not change too significantly over time), then you may employ various techniques for accelerating the queries. These are sometimes referred to as Spatial Indices, like a Quadtree.

You'll have to expect a trade-off between several factors, like the query time and the memory consumption, or the query time and the time that is required for updating the data structure when the given set of lines changes. The latter also depends on whether it is a structural change (lines being added or removed), or whether only the positions of the existing lines change.