Search code examples
pythonpandasdataframenetworkxnan

How to check if a node is linked to another one?


I have a dataset built by extracting data from external sources. The output is like this

Node       Target
Jennifer   Maria
Luke       Mark
Johnny     nan
Ludo       Martin
Maria      nan
Mark       Luke
Mark       Christopher 

and so on

When I built a network using networkx, since the target field for some of my nodes is null, I have isolated nodes, while there should be linked to a source node (e.g., Maria to Jennifer). I am considering directed network, but even if it was undirected, the problem would still persist since, when I load as nodes list the Nodes column, I get nodes with nan value in the Target linked to a node called nan. My question is: is there a way to check if the nodes within the Node column have a link (at least), looking at the Target column? Happy to provide more information.

My expected output would be

Node       Target
Jennifer   Maria
Luke       Mark
Johnny     nan
Ludo       Martin
Maria      Jennifer
Mark       Luke
Mark       Christopher 

In order to correctly create the network.


Solution

  • (a) find Target where NaN values, (b) find Node from a in Target. (c) replace NaN by Node from b and update your original dataframe.

    a = df.loc[df['Target'].isnull()]
    b = df[df['Target'].isin(a['Node'])]
    b = b.rename(columns={'Node': 'Target', 'Target': 'Node'})
    c = pd.merge(a['Node'], b, how='left', on='Node').set_index(a.index)
    df.update(c)
    
    >>> a
         Node Target
    2  Johnny    NaN
    4   Maria    NaN
    
    >>> b
         Target   Node
    0  Jennifer  Maria
    
    >>> c
         Node    Target
    2  Johnny       NaN
    4   Maria  Jennifer
    
    >>> df
           Node       Target
    0  Jennifer        Maria
    1      Luke         Mark
    2    Johnny          NaN  # <- NaN
    3      Ludo       Martin
    4     Maria     Jennifer  # <- Jennifer
    5      Mark         Luke
    6      Mark  Christopher
    

    Old Answer As suggested by @AKX, remove rows with NaN before build the graph:

    import networkx as nx
    
    edges = df[df.notna().all(1)]
    
    G = nx.from_pandas_edgelist(edges, source='Node', target='Target')
    
    >>> G.edges
    EdgeView([('Jennifer', 'Maria'), ('Luke', 'Mark'),
              ('Mark', 'Christopher'), ('Ludo', 'Martin')])