Search code examples
pythoncluster-analysisigraph

variation_of_information results of cdlib and igraph are different


Assume we have two community partitions as below:

Community partition 1

Community0= [8, 16, 17, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]
Community1= [1, 2, 3, 7, 11, 12, 13, 15, 18]
Community2= [0, 4, 5, 6, 9, 10, 14, 22]

So, community infomation of the nodes:

[2, 1, 1, 1, 2, 2, 2, 1, 0, 2, 2, 1, 1, 1, 2, 1, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Community partition 2

Community0= [32, 33, 8, 16, 17, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31]
Community1= [0, 3, 4, 5, 6, 9, 10, 11, 22]
Community2= [1, 2, 7, 12, 13, 14, 15, 18]

So, community infomation of the nodes:

[1, 2, 2, 1, 1, 1, 1, 2, 0, 1, 1, 1, 2, 2, 2, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

We want to compare these partitions to measure variation_of_information. We used evaluation.variation_of_information of cdlib and compare_communities with method='vi' of igraph python.

But, cdlib result is 0.66 and igraph result is 0.46.

Their results are different. How can we measure it?


Solution

  • I checked the source code from both libraries.

    igraph returns the variation of information in natural units, i.e. it uses the natural logarithm. cdlib returns it in bits, i.e. it uses the base-2 logarithm. The two results you get are consistent with each other because 0.46 / ln(2) = 0.66.

    I updated the igraph documentation to mention that natural units are used.