Search code examples
pythonscikit-learncluster-analysis

Scikit-learn: ARI score for cluster evaluation


I am calculating the Adjusted Rand index score for evaluating the cluster performance. Suppose, the true cluster and predicted cluster looks like the following. The format {i, "x"} tells that the element "x" is in ith cluster.

>>> labels_true = [{0,"a"}, {0,"b"}, {0,"c"}, {1,"d"}, {1,"e"}, {1,"f"}]
>>> labels_pred = [{0,"a"}, {0,"b"}, {1,"c"}, {1,"d"}, {2,"e"}, {2,"f"}]
>>> metrics.adjusted_rand_score(labels_true, labels_pred)

The ARI score is coming 1.0, but it seems it should not be 1.0 as the predicted cluster is different from the true one.

I am wondering if it is a valid way to calculate ARI score.


Solution

  • you just have to put the labels in the ARI score fonction :
    labels_true = [0, 0, 0, 1, 1, 1]
    labels_pred = [0, 0, 1, 1, 2, 2]
    metrics.adjusted_rand_score(labels_true, labels_pred)