Search code examples
data-sciencenetworkxgraph-theorybipartite

Get node weight for a simple bipartite graph


I've created a bipartite networkx graph from a CSV file that maps Disorders to Symptoms. So, a disorder may be linked to one or more Symptoms.

for disorder, symptoms in csv_dictionary.items():
    for i in range (0, len(symptoms)):
        G.add_edge(disorder, symptoms[i])

What I need is to find what Symptoms are connected to multiple diseases and sort them according to their weight. Any suggestions?


Solution

  • You can use degree of the created graph. Every symptom with degree larger than 1 belongs to at least two diseases:

    I've added some example csv_dictionary (please supply it in your next question as minimal reproducible example) and created a set of all symptoms during the creation of the graph. You could also think about adding these information as node feature to the graph.

    import networkx as nx
    
    csv_dictionary = {"a": ["A"], "b": ["B"], "c": ["A", "C"], "d": ["D"], "e": ["E", "B"], "f":["F"], "g":["F"], "h":["F"]}
    
    G = nx.Graph()
    
    all_symptoms = set()
    for disorder, symptoms in csv_dictionary.items():
        for i in range (0, len(symptoms)):
            G.add_edge(disorder, symptoms[i])
    
            all_symptoms.add(symptoms[i])
    
    symptoms_with_multiple_diseases = [symptom for symptom in all_symptoms if G.degree(symptom) > 1]
    print(symptoms_with_multiple_diseases)
    # ['B', 'F', 'A']
    
    sorted_symptoms = list(sorted(symptoms_with_multiple_diseases, key= lambda symptom: G.degree(symptom)))
    print(sorted_symptoms)
    # ['B', 'A', 'F']