Search code examples
pythonnetworkx

How to remove specific components in a disconnected network


Let’s suppose we have the below data (a sample of my whole dataset that counts thousands of rows):

Node Target 
1      2
1      3
1      5
2      1
2      3
2      6
7      8 
7      12  
9      13
9      15
9      14

Clearly, if I plot in a graph this data I have two components that are disconnected. I am wondering how to isolate or remove one component from my network, e.g., the smallest one. I would say that first I should identify the components, then isolate/filtering out the component(s) that I am not interested in.

G = nx.from_pandas_edgelist(df, 'Node', 'Target')

connected_com=[len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)]

Now I should create a network only with data from the largest component:

largest_cc = max(nx.connected_components(G), key=len) 

This is easy in case of two components. However, if I would like to select two components and exclude one, how should I do? This is my question.


Solution

  • In the example data you provided, 3 islands are obtained when plotting the graph with the code below:

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import networkx as nx 
    
    df=pd.read_fwf('data.txt')
    G = nx.from_pandas_edgelist(df, 'Node', 'Target')
    nx.draw(G,with_labels=True)
    

    And the graph looks like that:

    enter image description here

    Now if you want to only keep the biggest two islands you can use the nx.connected_components(G) function that you mentioned and store the two biggest components. Below is the code to do this:

    N_subs=2 #Number of biggest islands you want to keep
    G_sub=[]
    largest_components=[]
    for i in range(N_subs):
      largest_components.append(sorted(nx.connected_components(G), key=len, reverse=True)[i])
      G_sub.append(G.subgraph(largest_components[i]))
    

    You will then need to create a subgraph of G that is composed of both islands. You can use nx.compose_all to do that. And you can then just plot your subgraph

    G_subgraphs=nx.compose_all(G_sub)
    nx.draw(G_subgraphs,with_labels=True)
    

    So overall the code looks like that:

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import networkx as nx 
    df=pd.read_fwf('data.txt')
    G = nx.from_pandas_edgelist(df, 'Node', 'Target')
    
    N_subs=2
    G_sub=[]
    largest_components=[]
    for i in range(N_subs):
      largest_components.append(sorted(nx.connected_components(G), key=len, reverse=True)[i])
      G_sub.append(G.subgraph(largest_components[i]))
    
    G_subgraphs=nx.compose_all(G_sub)
    nx.draw(G_subgraphs,with_labels=True)
    

    And the output of this code gives:

    strong text

    Note: According to this, nx.connected_components is best used for undirected graphs. Since you are dealing with directed graphs, you might want to use strongly_connected_components(G) or weakly_connected_components(G) instead.