Search code examples
pythonpandasnetworkx

Filter edges that connect nodes from different classes


The dataset, in the form

Source     Target     Source_Class     Target_Class
1          2          1                     0
1          3          1                     0
2          1          0                     1 
4          2          0                     0
5          4          0                     0
5          1          0                     1
3          1          0                     1

is used to build a network, where Source_Class is a Source's attribute and Target_Class is a Target's attribute. I need to find the edges that link two nodes having different classes, for example 1 (which has class 1) and 2 (which has class 0); 1 and 3, and so on, i.e. a list of edges that are 'connectors' within the network, as they link two nodes having different classes.

Written as above, the problem seems pretty easy to solve, but I have a question on how to consider only once the Source/Target nodes. For instance, I could use a logical sum and select only the rows that have 0(1) in Source_Class(Target_Class) and 1(0) in Target_Class (Source_Class). But, I would have duplicates, considering the network as undirected.

Source     Target     Source_Class     Target_Class
    1          2          1                     0
    1          3          1                     0
    2          1          0                     1 
    5          1          0                     1
    3          1          0                     1

My expected output would be:

Source Target  Different 
1        2         1
1        3         1
5        1         1

Do you know how to filter duplicates out?


Solution

  • Use, np.sort to order the Source/Target pair, then you can groupby on that:

    a = np.sort(df[['Source', 'Target']], axis=1)
    
    (df.groupby([a[:,0], a[:,1]]).head(1)
       .reset_index(drop=True)
       .query('Source_Class != Target_Class')
    )
    

    Output:

       Source  Target  Source_Class  Target_Class
    0       1       2             1             0
    1       1       3             1             0
    4       5       1             0             1