algorithm geospatial computational-geometry

find (near-)minimal covering set of discs on a 2-D plane

OK, say I've got a bunch of discs sitting on a plane in fixed known locations. Each disc is 1 unit in radius. The plane is fully covered by the set of discs, in fact, it is extensively over-covered by the set of discs, by an order of magnitude or two in some areas. I'd like to find a subset of the discs that still fully cover the plane. Optimal is nice, but not necessary.

Here's the before illustration:

toomanydiscs

And here's the after illustration:

justright

It seems to me that there's a dual problem having to do with Delaunay triangulation, but I'm not quite sure that helps me. I also know that this is similar, but not the same as, the disc covering problem in computational geometry. Is this a standard problem whose name I don't know?

Possible approaches seem to me to include growing a covering set using a local greedy search, and iteratively using a nearest-pair query to remove discs one at a time. I'm not sure if either is guaranteed to work well, and I haven't worked through the details.

Oh, and the application is, if you haven't guessed it, finding a subsample of ZIP code centroids to cover a map when making queries, so n is about 50,000.

Solution

Game Plan

The following is basically just a more precise restatement of your problem, but it might help:

Enumerate every connected region in the plane that results when the boundaries of all disks are drawn. By assumption, each of these regions is covered by 1 or more disks.
Each region is a "thing to be covered", and each disk is a "covering thing". Find the minimum set cover on this set of regions. This is NP-hard unfortunately.

This might not be exploiting all the structure available in the problem, but it will definitely give you an optimal answer.

Enumerating Regions

Enumerating the regions and recording which disks cover each in step 1 is the tricky part. Regions are not in general convex which makes intersection tests tricky, and every circle you add potentially doubles the number of regions. Here is how I would approach that:

Forget about the actual location of each region, and define a region only in terms of which disks it is inside and which it is outside. I.e. a region is defined by a length-n vector of 0/1 values, each indicating whether the region inside or outside that disk is to be included in the intersection -- the region in question is formed by intersecting all these n regions. So in principle you could have up to 2^n regions, but in practice some (most) vectors produce empty regions because they entail intersecting two disks that have no intersection -- this is easy to test for, thankfully. It should be straightforward to recursively generate all non-empty regions, except that...

Bad News

Unfortunately I now see that it is necessary to perform full intersection testing, because it's not always possible to tell when a region will be empty. The critical counterexample is that, given two disks A and B that have a small sliver of overlap and another disk C that overlaps each of A and B, depending on the positions of all 3 disks, the intersection of all 3 either may or may not be non-empty. (To see this, draw 3 disks in different colours with 50% opacity in a drawing program, and move them around.)

A Workable Hack

Since generating the exact list of non-empty regions looks like it will be a lot of work and take a long time due to intersection testing, and you claim you don't need optimal solutions, you could try just using a grid of sample points as the set of "things to be covered" instead of the exact list of non-empty regions. It's straightforward to determine which disks cover a given sample point. Then solve maximum set cover as before.

To get confidence that there are no gaps, rerun several times, randomly jittering the sample points' co-ordinates each time. Increase the density of sample points until there is no change in the final result.