Search code examples
pythongraph-toolcomplex-networks

Graph tool - Stochastic Block Model vs Leiden


I'm calculating network communities for 4 networks using 2 methods:

  1. 'Leiden' method, which gives me 7 (a), 13 (b), 19 (c), 22 (d) communities.

  2. 'Stochastic block Model', also checking group membership of the nodes by inspecting levels of the hierarchy, like so:


    state = gt.inference.minimize_nested_blockmodel_dl(g)
    state.print_summary()

    levels = state.get_levels()
    for s in levels:
        print(s)
        if s.get_N() == 1:
            break
    lstate = state.levels[0]
    b = lstate.get_blocks()
    print(b[10])

which prints:

<BlockState object with 228 blocks (21 nonempty), degree-corrected, for graph <Graph object, undirected, with 228 vertices and 1370 edges, 1 internal vertex property, 1 internal edge property, at 0x7fbaff1c8d50>, at 0x7fba9fac1bd0>
<BlockState object with 21 blocks (6 nonempty), for graph <Graph object, undirected, with 228 vertices and 96 edges, at 0x7fb9a3c51910>, at 0x7fb9a2dd1a10>
<BlockState object with 6 blocks (1 nonempty), for graph <Graph object, undirected, with 21 vertices and 20 edges, at 0x7fb9a3c51590>, at 0x7fb9a3c51ed0>
<BlockState object with 1 blocks (1 nonempty), for graph <Graph object, undirected, with 6 vertices and 1 edge, at 0x7fb9a6f034d0>, at 0x7fb9a3c51790>
190
<Graph object, undirected, with 3459 vertices and 134046 edges, 1 internal vertex property, 1 internal edge property, at 0x7fbb62e22790>
l: 0, N: 3459, B: 294
l: 1, N: 294, B: 85
l: 2, N: 85, B: 34
l: 3, N: 34, B: 12
l: 4, N: 12, B: 4
l: 5, N: 4, B: 1
l: 6, N: 1, B: 1

and draws:

enter image description here

This looks like having WAY more communities than using Leiden, and I'm trying to wrap my head around why, as well as this SBM concept.

Are these SBM graphs depicting adicional levels of hierarchy or is there something else going on here that justifies so many more communities?


Solution

  • The method of modularity maximization (of which Leiden is an implementation) has two important properties:

    1. It only searches for assortative communities (i.e. groups with more internal connections than external ones).
    2. It is a statistically inconsistent method, that will both overfit and underfit, depending on the situation. A discussion on this matter can be found here: https://skewed.de/tiago/blog/modularity-harmful

    The SBM inference method is different in both counts:

    1. It finds groups with arbitrary mixing patterns, i.e. preferences of connections to other groups. Assortativity is a special case, but there are many others possible patterns.
    2. It achieves this in a statistically principled manner, avoiding both overfitting and underfitting. For a theoretical introduction, see: https://arxiv.org/abs/1705.10225. For a discussion on the differences between inferential and non-inferential methods see: https://arxiv.org/abs/2112.00183

    Because of the above, one should not expect SBM inference and Leiden/Louvain to yield similar answers in general.

    Now, for whatever reason, you may be interested to find only assortative communities. You can also do that with the SBM, but using a more constrained parametrization. You can do this with graph-tool as explained here: https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#assortative-community-structure