Search code examples
rclustered-indexhierarchical-clustering

The right steps to enter a value in the arguments x and y in Adjusted Rand Index?


Im attempting to use the Adjusted Rand Index to compare clustering results. Here, I use Iris data set as an example. These are the code:

iris.data=subset(iris, select=-Species)
iris.eucdist <- dist(iris.data, method="euclidean")
iris.sqeucdist <- iris.eucdist^2
iris.hc <- hclust(iris.sqeucdist, "ward")
plot(iris.hc, main="Dendrogram of Ward's Method", label=iris$Species)
table(cutree(iris.hc, 3), iris$Species)

##        setosa versicolor virginica
##   1     50          0         0
##   2      0         49        15
##   3      0          1        35

Firstly, I compute the ARI(Hubert and Arabie, 1985) manually, by using value in the table above. The answer is 0.7311986. However, when I using R, I cannot get the same answer.

library(mclust)
U=c(50,0,0,50,0,49,1,50,0,15,35,50)
V=c(50,0,0,50,0,49,15,64,0,1,35,36)
adjustedRandIndex(U,V)
## [1] 0.6961326

Perhaps, the way I put in the value is wrong. Is there a way to implement this so that the answer from R is same with manual computation?


Solution

  • Looking at ?adjustedRandIndex suggests (states) x and y should be vectors of class labels or similar, not the results of the cross-tabulation

    adjustedRandIndex(cutree(iris.hc, 3), iris$Species)
    

    gives

    ## [1] 0.7311986