what is the difference between the "compare" command applied to cluster algorithms, and to its application to cluster memberships, in igraph.
As states manual page:
compare (sg, le, method = "rand")
compare (membership (sg), membership (le))
I you read the documentation for the compare
, it's format is :
compare(comm1, comm2, method = c("vi", "nmi", "split.join", "rand", "adjusted.rand"))
The documentation for comm1
and comm2
mentions the following:
comm1 : A communities object containing a community structure; or a numeric vector, the membership vector of the first community structure. The membership vector should contain the community id of each vertex, the numbering of the communities starts with one.
The complete code mentioned towards the end is
g <- make_graph("Zachary")
sg <- cluster_spinglass(g)
le <- cluster_leading_eigen(g)
compare(sg, le, method="rand")
compare(membership(sg), membership(le))
Now in the first case:
compare(sg, le, method="rand")
sg
and le
are the cluster objects themselves, i.e. they are results of community detection via spin-glass model and community detetection by calculating the leading non-negative eigenvector of the modularity matrix of the graph respectively. In short, both contain community structures of the data.
Now in the second case:
compare(membership(sg), membership(le))
This uses membership
which does the following:
membership gives the division of the vertices, into communities. It returns a numeric vector, one value for each vertex, the id of its community. Community ids start from one. Note that some algorithms calculate the complete (or incomplete) hierarchical structure of the communities, and not just a single partitioning. For these algorithms typically the membership for the highest modularity value is returned, but see also the manual pages of the individual algorithms
You can read more about the function here.
So as you can see, this returns a numeric vector containing the membership information of each vertex, which is the second type of value that is permitted in the comm1
and comm2
parameter of compare
function.
Hence, both statements are essentially same. They are just different ways of accomplishing the same thing.
If you run the code given towards the end of the documentation, you will see the following:
> g <- make_graph("Zachary")
> sg <- cluster_spinglass(g)
> le <- cluster_leading_eigen(g)
> compare(sg, le, method="rand")
[1] 0.9500891
> compare(membership(sg), membership(le))
[1] 0.2765712
The difference in results is because the method
attribute is set to rand
in the first call. If you assign the method
attribute in the second call as well, you will see the exact same results:
> g <- make_graph("Zachary")
> sg <- cluster_spinglass(g)
> le <- cluster_leading_eigen(g)
> compare(sg, le, method="rand")
[1] 0.9500891
> compare(membership(sg), membership(le), method="rand")
[1] 0.9500891
As you can see, both provide identical results.
Reference: