Search code examples
pythoncluster-analysiscomputational-geometryconvex-hullhierarchical-clustering

Convex hulls of hierarchical clustering in Python


I'm using hierarchical clustering to try to visualize a large set of data that has been flattened to two dimensions. What I want to do is create a visualization that allows me to look at the data from different heights in the hierarchy, by rendering clusters as the convex hulls of their constituent points. The toughest part of this problem is that I need an algorithm that can efficiently merge the convex hulls of pairs clusters as I move up the hierarchy. I've seen a lot of algorithms for calculating the convex hulls of points in O(n log n) time, but it seems as though it would be much more efficient in this case to exploit the substructure of the problem, but I'm not exactly sure how.

Edit:

For more information, the data structure is an array that begins with the original points of the clustering, and then says which points/clusters are combined to form the next cluster. So it's kind of like a tree/pointer structure, but contained in one big array. The important part is that it's efficient to see what the two constituent clusters are of any super cluster, but it's not efficient to get the set of all points belonging to a cluster. So any reasonable algorithm has to work from the bottom up.

So let's say we're in the middle of the hierarchy someplace, and the precomputed hierarchy says that clusters A and B are merged to produce cluster C. We're going from the bottom up, so we already have computed the convex hulls of the points in clusters A and B, so we simply need to combine them to produce the convex hull of cluster C. Cluster A's convex hull could actually be a single point, a pair, or a full polygon. Same goes for cluster B. So there are several cases for how these should be merged to form the convex hull of cluster C, but I'd bet there's a clever solution that would probably treat singletons and pairs the same way as polygons.

The most obvious solution would be to calculate the convex hull with the combined set of points from the convex hulls of clusters A and B. But I need to do this on a hierarchy of 100k points, so I'm wondering if there's a more efficient way to combine the convex hulls of A and B.

Edit 2:

         /----5
    1---/    / \
   / \      / B 8
  2 A 3  C 6   /
   \ /      \ /
    4--------7

Okay, so I attempted to ASCII out an illustration of what I mean. The convex hull of cluster A is 1-2-3-4, the convex hull of B is 5-6-7-8, and the convex hull of C is 1-2-4-7-8-5. Presumably, clusters A and B contain additional points inside of their hulls, but these clearly cannot possible become part of the hull of C, so the problem is an algorithm that determines where to "splice" the hulls of clusters A and B to form the hull of C, based on the coordinates of the points. This is the inductive step of the whole process. (Eventually C will be combined with cluster D and so on, until the algorithm ends with the topmost cluster, which will have as its convex hull the convex hull of all points).


Solution

  • There are at least two convex hull merge algorithms that I'm aware of -- rotating calipers of Toussaint (section 5 of the paper) and the bridging algorithm of Preparata and Hong (see section 3 of the paper). Both of these algorithms take time linear in h = h1 + h2, where h1 and h2 are the number of hull vertices in the first and second convex hulls respectively.