Search code examples
algorithmdata-structureslanguage-agnosticcomputational-geometry

Data structure/algorithm for merging similar items


Requirements:

Given bunch of line segments in 2D plane I need to be able to merge all the ones that are similar or equal to each other.

For example if i was given two line segments (0, 0) - (1, 1) and (1, 1) - (2, 2). Those two lines are connected and has the same slope. Thus I can just merge those two together into one line (0, 0) - (2, 2)

For line segments (0, 0) - (1, 1) and (1.01, 1) - (2, 2). Even though their slopes are a bit different and they are not connected, its not that visible to human eyes so I would still merge those two into (0, 0) - (2, 2) in exchange for performance.

For line segments (0, 0) - (1, 1) and (0.5, 0.5) - (0.6, 0.6). The later is just a segment of the former thus its safe to just remove the later and only preserve the former.

Obviously I can do this in brutal force way O(n^2) but that takes too long. Is there a good algorithm/data structure that could help to reduce the run time?

Attempts:
Range tree: It seems like a natural fit since it supports range query (lines with similar slope). However insert/delete are not supported.

R tree: R tree supports querying elements that are close by using a rectangle. Using this I could first find all lines that are within the bound box and filter out the ones with slope difference > epsilon or distance > epsilon2. However I can't find a good description on the implementation (search looks well documented but insert/delete are very vague)

B tree: B tree looks kind of promising but it seems like my use case is not the prime usage of it. Not sure if it would be the right way to go.


Solution

  • You can use projective duality with your favorite proximity structure (kd-tree, cover tree, etc.) to cluster the segments into groups that are nearly collinear. Then for each group you can use the standard sweep line algorithm to compute the union of intervals as a list of disjoint intervals.

    To compute the projective coordinates for the line that contains (a, b) and (c, d), we embed the endpoints in projective space as (a, b, 1) and (c, d, 1) and then compute the cross product. The thing with projective coordinates is that they're not unique. My naive suggestion would be to use the unit sphere in 3D as a covering space by normalizing with respect to the Euclidean norm and duplicating the point at its antipode.

    In other words, we map (a, b) - (c, d) to (x', y', z') and (-x', -y', -z') where (x, y, z) = (b - d, c - a, ad - bc) and x' = x/sqrt(x^2+y^2+z^2) and y' = y/sqrt(x^2+y^2+z^2) and z' = z/sqrt(x^2+y^2+z^2).