Search code examples
algorithmgeometryselectionsubsetsampling

How to select points at a regular density


how do I select a subset of points at a regular density? More formally,

Given

  1. a set A of irregularly spaced points,
  2. a metric of distance dist (e.g., Euclidean distance),
  3. and a target density d,

how can I select a smallest subset B that satisfies below?

  • for every point x in A,
  • there exists a point y in B
  • which satisfies dist(x,y) <= d

My current best shot is to

  • start with A itself
  • pick out the closest (or just particularly close) couple of points
  • randomly exclude one of them
  • repeat as long as the condition holds

and repeat the whole procedure for best luck. But are there better ways?

I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.


bottom-up method

  • select a random point
  • select among unselected y for which min(d(x,y) for x in selected) is largest
  • keep going!

I'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?

performance measure

If guarantee of optimality is not required, I think these two indicators could be useful:

  • radius of coverage: max {y in unselected} min(d(x,y) for x in selected)
  • radius of economy: min {y in selected != x} min(d(x,y) for x in selected)

RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE is more desirable.

my little methods

For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:

bottom-up.norm top-down.norm

RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.

Here I packed everything.

http://www.filehosting.org/file/details/352267/density_sampling.tar.gz


Solution

  • A genetic algorithm may probably produce good results here.

    update:

    I have been playing a little with this problem and these are my findings:

    A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:

    1. start with B empty
    2. select a random point x from A and place it in B
    3. remove from A every point y such that dist(x, y) < d
    4. while A is not empty go to 2

    A kd-tree can be used to perform the look ups in step 3 relatively fast.

    The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.

    Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.

    For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.

    For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.

    I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.

    The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl

    It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.