Search code examples
pandasmergegrouping

mask linked data across two columns


This is an subset of huge dataset, I want that all the numbers in ID-2 columns linked directly or indirectly to 1 in ID-1 to be grouped next to 1, same with 2 and so on.

   ID1  ID2
0    1    2
1    1    4
2    2    6
3    2    5
4    3    7

to:

  ID1  ID2
0   1    2
1        4
2        6
3        5
4   3    7

Solution

  • You can use boolean indexing:

    # map ID2 values to ID1 or keep ID1
    s = df['ID1'].map(df.set_index('ID2')['ID1']).fillna(df['ID1'])
    
    # mask duplicated values
    df.loc[s.duplicated(), 'ID1'] = ''
    

    output:

      ID1  ID2
    0   1    2
    1        4
    2        6
    3        5
    4   3    7
    

    If you prefer a new column:

    s = df['ID1'].map(df.set_index('ID2')['ID1']).fillna(df['ID1'])
    
    df['new'] = df['ID1'].mask(s.duplicated(), '')
    

    output:

       ID1  ID2 new
    0    1    2   1
    1    1    4    
    2    2    6    
    3    2    5    
    4    3    7   3