Search code examples
pythonpandasnetworkxgraph-theory

The group is not included in different clusters in networkx graph clustering


I have a symmetrical square matrix. The rows contain weights (they are not normalized, but the greater the weight, the closer it is to 1). I want to cluster groups, and i use networks:

 from networkx.algorithms.community import girvan_newman

matrix = pd.DataFrame({
    'group': ['g1', 'g2', 'g3', 'g4'],
    'g1': [2, 1, 0, 1],
    'g2': [1, 2, 0, 0],
    'g3': [0, 0, 2, 2],
    'g4': [1, 0, 2, 3]}
G = nx.Graph()
for group1 in matrix.index:
    for group2 in matrix.index:
        weight = matrix.loc[group1, group2] if group1 != group2 else 0
        if weight > 0:
            G.add_edge(group1, group2, weight=weight)
            
communities = girvan_newman(G)
first_communities = tuple(sorted(c) for c in next(communities))
print("Groups girvan_newman:", first_communities)

Groups girvan_newman: (['g1', 'g2', 'g4'], ['g3'])

The problem is that for some reason the 'g4' group is not combined with the 'g3' group. It turns out that a group can only be a member of one cluster. But 'g4' has more weight with 'g3' than with 'g1' and 'g2'. I don't know how to improve it


Solution

  • You issue is due to having 'group' as column and not index.

    You would need:

    matrix = matrix.set_index('group')
    

    But why don't you create the graph with nx.from_pandas_adjacency:

    import networkx as nx
    from networkx.algorithms.community import girvan_newman
    
    G = nx.from_pandas_adjacency(matrix.set_index('group'))
    G.remove_edges_from(nx.selfloop_edges(G))
    
    communities = girvan_newman(G)
    first_communities = tuple(sorted(c) for c in next(communities))
    print("Groups girvan_newman:", first_communities)
    

    Output:

    Groups girvan_newman: (['g1', 'g2'], ['g3', 'g4'])
    

    Graph:

    enter image description here