Search code examples
graphnetworkxpartitioningmetis

Creating subgraphs with overlapping vertices


I've been looking for packages using which I could create subgraphs with overlapping vertices. From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.

Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.

EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.

import networkx as nx
from cdlib import algorithms
   
if __name__ == '__main__':
 
    g = nx.karate_club_graph()

    coms = algorithms.angel(g, threshold=4, min_community_size=10)
    print(coms.method_name)
    print(coms.method_parameters)  # Clustering parameters)
    print(coms.communities)
    print(coms.overlap)
    print(coms.node_coverage)

Output:

ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118

From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes. It is not clear to me how the overlap threshold (4 overlaps) has to be specified here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel

:param threshold: merging threshold in [0,1].

I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.


Solution

  • You can check out CDLIB:

    They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.

    Specifically about the angel algorithm in CDLIB:

    According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:

    If the ratio is greater than (or equal to) a given threshold, the merge is applied and the node label updated.

    • Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.

    • Also, don't mistake "labels" with "node's labels" (as in nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.

    As for the effects of varying this threshold:

    [...] Increasing the threshold, we obtain a higher number of communities since lower quality merges cannot take place.

    [based on the comment by @J. M. Arnold]
    From ANGEL's github repository you can see that when threshold >= 1 only the min_comsize value is used:

    self.threshold = threshold
    
    if self.threshold < 1:
        self.min_community_size = max([3, min_comsize, int(1. / (1 - self.threshold))])
    else:
        self.min_community_size = min_comsize