Search code examples
pythoncluster-analysis

How to know which is the exemplar for the cluster created by Affinity Propagation


I'm working on Image matching, I used Affinity Propagation in Python to create clusters of images. Since AP chooses an exemplar for each cluster, How do I know which is the image represented as an exemplar for that cluster ?


Solution

  • Affinity Propagation does not have a canonical way to "classify" new images. Clusters are not assigned by the affinity itself, but by "responsibility" and "availability". Roughly - but only approximately, and I think the sklearn implementation is incorrect there - objects are assigned to their "nearest" (highest affinity, although affinities are commonly derived from distances) cluster. But if the nearest has only low availability, and the second nearest is almost as close but with much higher availability and responsibility for this point, then points may be assigned not to the "nearest" exemplar. I'm not sure if some kind of transitivity (as in DBSCAN) can happen, too. If so, AP would be able to better handle clusters of varying diameter or shape, and be less similar to k-means. In my experiments, AP was pretty much similar to k-means, just much much slower... I don't have a simple test case for such a situation though, and it will IMHO usually only affect a few points.

    Nevertheless, it appears to be a common approach to simply assign "new points" to the nearest exemplar. And if you read the sklearn documentation, it has a method that does exactly this. I don't think this is a good idea, because now fit then transform will not produce the same result as fit_transform... but this is likely never going to change because of "backwards compatibility".

    sklearn also has an attribute that will give you the indexes of the exemplars chosen by AP, so you could easily do this yourself, too (which is necessary if you used a precomputed affinity matrix).