I have a symmetrical square matrix. The rows contain weights (they are not normalized, but the greater the weight, the closer it is to 1). I want to cluster groups, and i use networks:
from networkx.algorithms.community import girvan_newman
matrix = pd.DataFrame({
'group': ['g1', 'g2', 'g3', 'g4'],
'g1': [2, 1, 0, 1],
'g2': [1, 2, 0, 0],
'g3': [0, 0, 2, 2],
'g4': [1, 0, 2, 3]}
G = nx.Graph()
for group1 in matrix.index:
for group2 in matrix.index:
weight = matrix.loc[group1, group2] if group1 != group2 else 0
if weight > 0:
G.add_edge(group1, group2, weight=weight)
communities = girvan_newman(G)
first_communities = tuple(sorted(c) for c in next(communities))
print("Groups girvan_newman:", first_communities)
Groups girvan_newman: (['g1', 'g2', 'g4'], ['g3'])
The problem is that for some reason the 'g4' group is not combined with the 'g3' group. It turns out that a group can only be a member of one cluster. But 'g4' has more weight with 'g3' than with 'g1' and 'g2'. I don't know how to improve it
You issue is due to having 'group' as column and not index.
You would need:
matrix = matrix.set_index('group')
But why don't you create the graph with nx.from_pandas_adjacency
:
import networkx as nx
from networkx.algorithms.community import girvan_newman
G = nx.from_pandas_adjacency(matrix.set_index('group'))
G.remove_edges_from(nx.selfloop_edges(G))
communities = girvan_newman(G)
first_communities = tuple(sorted(c) for c in next(communities))
print("Groups girvan_newman:", first_communities)
Output:
Groups girvan_newman: (['g1', 'g2'], ['g3', 'g4'])
Graph: