I am using python and networkx to model which speaker mentions which item in a conversation. To this end, I want to build a bipartite graph, where one set of nodes represents the speakers and the other represents the items. From this graph, I want to calculate some centrality measures for example betweenness centrality. I also want to pass the weight of each edge as an argument, where the weight represents how often the speaker has mentioned an item. However, the bipartite implementation doesn't allow me to pass weight as an argument.
What is the difference between the implementations and which one should I use to model my problem?
import networkx as nx
from networkx.algorithms import bipartite
speakers = ['Pink', 'Green']
items = ['Knife', 'Rope']
B = nx.Graph()
B.add_nodes_from(items, bipartite = 0)
B.add_nodes_from(speakers, bipartite = 1)
B.add_edge('Pink', 'Knife', weight = 10)
B.add_edge('Pink', 'Rope', weight = 4)
B.add_edge('Green', 'Rope', weight = 2)
B.add_edge('Green', 'Knife', weight = 7)
bottom_nodes, top_nodes = bipartite.sets(B)
print(nx.is_bipartite(B))
print(bipartite.betweenness_centrality(B, bottom_nodes))
print(nx.betweenness_centrality(B))
print(nx.betweenness_centrality(B, weight = 'weight'))
I get the following output:
True
{'Knife': 0.25, 'Rope': 0.25, 'Pink': 0.25, 'Green': 0.25}
{'Knife': 0.16666666666666666, 'Rope': 0.16666666666666666, 'Pink': 0.16666666666666666, 'Green': 0.16666666666666666}
{'Knife': 0.0, 'Rope': 0.3333333333333333, 'Pink': 0.0, 'Green': 0.3333333333333333}
I expect the results to be the same; so what is the difference in the implementation?
There is an explanation in the documentation of the bipartite betweenness centrality.
Betweenness centrality of a node
v
is the sum of the fraction of all-pairs shortest paths that pass throughv
.Values of betweenness are normalized by the maximum possible value which for bipartite graphs is limited by the relative size of the two node sets
The documentation goes on to describe the normalization factor.
In fact if you check the source code for the betweenness version, it directly calls the undirected betweenness centrality code and then applies the additional normalization factor. So the only reason that the bipartite version exists is so that you can normalize them differently.
Currently there is no option to include the weight in the bipartite version. However, if you look at the source code, it would not be hard to include. If you do this, you should think about exactly how you should do that to capture whatever the process is you're trying to explain. That means you should understand how that normalization is derived. Otherwise, I would stick to the nonbipartite version and just make sure you recognize that it's not okay to compare the centrality of nodes in different components.