I'm making a taxonomic cladogram using networkx for a university project. I'm trying to connect the taxonomic name with its parent name, going from each species name, up to each family name until the base of the cladogram. For this I'm comparing the name in one column with the names in the other column and making an edge between the dots generated, however I'm not able to search through the columns the way I want to and the error is too extensive to get the solution in a quick google search, if anybody knows a way to do this please let me know.
this is the code I'm trying
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
#df = pd.read_csv("E:/Escritorio/tp mat 3/pbdb_data.csv") #direccion labo
df = pd.read_csv("D:/unsam/mat 3/TP 1/pbdb_data.csv") #direccion pc
df = df.drop(["orig_no","taxon_no","record_type","flags","difference","accepted_no","parent_no","immpar_no","immpar_name","container_no","reference_no","is_extant"], axis=1)
print(df)
G = nx.Graph()
G.add_nodes_from(df["taxon_name"])
for i in df["parent_name"]:
for j in df["taxon_name"]:
if df[i] == df[j]:
x =+ 1
print (x)
nx.draw_networkx(G)
plt.draw()
the csv is like this:
taxon_rank taxon_name accepted_rank accepted_name parent_name n_occs
0 unranked clade Dinosauria unranked clade Dinosauria Dinosauriformes 1952
1 unranked clade Megalosauridae unranked clade Megalosauridae Dinosauria 2
2 unranked clade Ornithischia unranked clade Ornithischia Dinosauria 236
3 unranked clade Genasauria unranked clade Genasauria Ornithischia 208
4 unranked clade Cerapoda unranked clade Cerapoda Genasauria 173
I couldn't find a tree graph like something you are looking for on networkx However you can try:
G = nx.from_pandas_edgelist(df[["parent_name", "taxon_name"]].drop_duplicates(), 'parent_name', 'taxon_name', create_using=nx.Graph())
nx.draw_networkx(G, with_labels=True)