Search code examples
pythonmachine-learningscikit-learncluster-analysis

AffinityPropagation .labels_ vs .predict()


Code of .labels vs. .predict()

I'm doing clustering with AffinityPropagation from sklearn.

Using clustering.labels_ produces a different (albeit almost identical) result to doing clustering.predict on the same training data.

Any insight into why this is true?


Solution

  • Don't use predict with any clustering except k-means-family.

    If you do fit, the result is computed with affinity propagation.

    If you invoke predict, it is not actually doing AP. Instead, it just finds the nearest exemplar for each point. That may, or may not, give the same results, as you have observed. Since near points are most likely responsible, this has a high chance of being correct - but it is not doing any form of affinity propagation, it is a nearest-neighbor classification to the exemplars.

    k-means is fine because it uses the nearest-center logic. But that does not generally hold for clustering. In general, clustering algorithms cannot predict for out-of-sample data; you need a classififer for that.