how do I select a subset of points at a regular density? More formally,
Given
dist
(e.g., Euclidean distance),how can I select a smallest subset B that satisfies below?
dist(x,y) <= d
My current best shot is to
and repeat the whole procedure for best luck. But are there better ways?
I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.
y
for which min(d(x,y) for x in selected)
is largestI'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?
If guarantee of optimality is not required, I think these two indicators could be useful:
max {y in unselected} min(d(x,y) for x in selected)
min {y in selected != x} min(d(x,y) for x in selected)
RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE
is more desirable.
For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:
RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.
Here I packed everything.
http://www.filehosting.org/file/details/352267/density_sampling.tar.gz
A genetic algorithm may probably produce good results here.
update:
I have been playing a little with this problem and these are my findings:
A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:
A kd-tree can be used to perform the look ups in step 3 relatively fast.
The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.
Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.
For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.
For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.
I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.
The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl
It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.