Search code examples
pythonpandasduplicatesdrop-duplicates

How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas?


I have DataFrame in Python Pandas like below:

data types:

  • ID - int

  • TYPE - object

  • TG_A - int

  • TG_B - int

    ID TYPE TG_A TG_B
    111 A 1 0
    111 B 1 0
    222 B 1 0
    222 A 1 0
    333 B 0 1
    333 A 0 1

And I need to drop duplicates in above DataFrame, so as to:

  • If value in ID in my DF is duplicated -> drop rows where TYPE = B and TG_A = 1 or TYPE = A and TG_B = 1

So, as a result I need something like below:

ID  | TYPE | TG_A | TG_B
----|------|------|-----
111 | A    | 1    | 0
222 | A    | 1    | 0
333 | B    | 0    | 1

How can I do that in Python Pandas ?


Solution

  • You can use two boolean masks and groupby.idxmax to get the first non matching value:

    m1 = df['TYPE'].eq('B') & df['TG_A'].eq(1)
    m2 = df['TYPE'].eq('A') & df['TG_B'].eq(1)
    
    out = df.loc[(~(m1|m2)).groupby(df['ID']).idxmax()]
    

    Output:

        ID TYPE  TG_A  TG_B
    0  111    A     1     0
    3  222    A     1     0
    4  333    B     0     1