I've created a bipartite networkx graph from a CSV file that maps Disorders to Symptoms. So, a disorder may be linked to one or more Symptoms.
for disorder, symptoms in csv_dictionary.items():
for i in range (0, len(symptoms)):
G.add_edge(disorder, symptoms[i])
What I need is to find what Symptoms are connected to multiple diseases and sort them according to their weight. Any suggestions?
You can use degree
of the created graph. Every symptom with degree larger than 1 belongs to at least two diseases:
I've added some example csv_dictionary
(please supply it in your next question as minimal reproducible example) and created a set of all symptoms during the creation of the graph. You could also think about adding these information as node feature to the graph.
import networkx as nx
csv_dictionary = {"a": ["A"], "b": ["B"], "c": ["A", "C"], "d": ["D"], "e": ["E", "B"], "f":["F"], "g":["F"], "h":["F"]}
G = nx.Graph()
all_symptoms = set()
for disorder, symptoms in csv_dictionary.items():
for i in range (0, len(symptoms)):
G.add_edge(disorder, symptoms[i])
all_symptoms.add(symptoms[i])
symptoms_with_multiple_diseases = [symptom for symptom in all_symptoms if G.degree(symptom) > 1]
print(symptoms_with_multiple_diseases)
# ['B', 'F', 'A']
sorted_symptoms = list(sorted(symptoms_with_multiple_diseases, key= lambda symptom: G.degree(symptom)))
print(sorted_symptoms)
# ['B', 'A', 'F']