Search code examples
optimizationmultidimensional-arraydata-structurestime-complexitypoints

Is there a datastructure that allows finding points close to each other efficiently?


I'm looking for a datastructure.
Lets say you have n Points p := (x,y) with x,y ∈ [-128, 128]
You now initalize the Datastructure and add all n Points to it.
Now for any point you want to easily find any points close to it.
More precisely:
Specify a radius r<1 and point p.
You want a function F that outputs a (unsorted)list of all points q with d(p,q) < r
Now i'm looking for a datastructure that allows for optimizing this function (The standard algorithm is in O(n), can you get better than that?)
I'd be greatful for an answer :)

For people that know their stuff and want to help even further:
Say the points move during intervalls(with a maximum distance of a < 2).
During every intervall F is called for every point (n-times) , we now want to expand the optimization that after every intervall the Function F is as efficient.
So we want a function G that resorts the datastructure.
G is called once and F called n times. We want O(G) + n*O(F) < O(n^2)

In terms of worst case there really is no room for improvement so we make the assumption that in every intervall for every point p, at least 50% of all points are outside of the radius specified for the function F

The values above are kind of arbitrary and should be exchangeable with any other number. I chose these numbers so that the problem is easier to understand, in additon x and y are floating point numbers.


I'd like an answer that points me to another article, wikipedia entry or any other source that had the same or similar problem. I really expect noone to spend the whole day trying to explain a datastructure to me ;)

Anyway all help is appreciated. Thank you very much.


Solution

  • This problem reminds me of a particle simulation(which had similar problems like you describe) I wrote some time ago. I found a datastructure which allows(with a few minor deviations in practice and assuming you choose a good number of chunks) for O(n) complexity.

    You can divide the 2 dimensional space you have into small rectangular(I think squares are the best in your case) chunks(with side length bigger than r).

    Then you need O(n) time to sort the points into those chunks.

    Let k be the total number of chunks then you have.

    Then finding the all points that are within a a radius r for every point will take O(n*(n/k)) = O(n²/k) where n/k is the approximate number of points inside each chunk(assuming a regular distribution which was true for the particle simulation, not sure about your problem though). Keep in mind for every point you also need to look at the 8 neighboring chunks!

    Then you also have an additional O(k) which comes from the fact that you need to iterate through the chunks to access the elements.

    So in total this data structure has a complexity of O(n²/k + n + k). Now to find a relation between n and the optimal k you have to find the minima of the the function f(k) = a*n²/k + b*n + c*k which can be done by finding the derivative and setting it equal to zero:

    f'(k) = -an²/k² + c = 0n²/k² = c/a = constant → n is proportional to k and therefor if k can be chosen optimal:

    O(n²/k + n + k) = O(n²/n + n+ n) = O(n)

    Worst case is of course still O(n²) when k = 1