Search code examples
pythonpandascsvnetworkx

Is there a way to search and compare through 2 different columns of strings of a csv file?


I'm making a taxonomic cladogram using networkx for a university project. I'm trying to connect the taxonomic name with its parent name, going from each species name, up to each family name until the base of the cladogram. For this I'm comparing the name in one column with the names in the other column and making an edge between the dots generated, however I'm not able to search through the columns the way I want to and the error is too extensive to get the solution in a quick google search, if anybody knows a way to do this please let me know.

this is the code I'm trying

import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt


#df = pd.read_csv("E:/Escritorio/tp mat 3/pbdb_data.csv") #direccion labo
df = pd.read_csv("D:/unsam/mat 3/TP 1/pbdb_data.csv") #direccion pc
df = df.drop(["orig_no","taxon_no","record_type","flags","difference","accepted_no","parent_no","immpar_no","immpar_name","container_no","reference_no","is_extant"], axis=1)

print(df)

G = nx.Graph()
G.add_nodes_from(df["taxon_name"])

for i in df["parent_name"]:
    for j in df["taxon_name"]:
        if df[i] == df[j]:
            x =+ 1

print (x)
nx.draw_networkx(G)
plt.draw()

the csv is like this:

taxon_rank                taxon_name   accepted_rank             accepted_name      parent_name  n_occs
0    unranked clade                Dinosauria  unranked clade                Dinosauria  Dinosauriformes    1952
1    unranked clade            Megalosauridae  unranked clade            Megalosauridae       Dinosauria       2
2    unranked clade              Ornithischia  unranked clade              Ornithischia       Dinosauria     236
3    unranked clade                Genasauria  unranked clade                Genasauria     Ornithischia     208
4    unranked clade                  Cerapoda  unranked clade                  Cerapoda       Genasauria     173

Solution

  • I couldn't find a tree graph like something you are looking for on networkx However you can try:

    G = nx.from_pandas_edgelist(df[["parent_name", "taxon_name"]].drop_duplicates(), 'parent_name', 'taxon_name', create_using=nx.Graph())
    nx.draw_networkx(G, with_labels=True)