Search code examples
pythonpandasgroup-bydrop-duplicates

Drop duplicates when for a group a string present more than once in a column-pandas


Is there a way to groupby based on 2 columns (Id, Name) in a dataframe and if the presence of a certain string "x_1" in the column "Name" is more than once, then just keep the first row (first occurrence)?

Id Name Value
1  x_1  23
1  x_2  24
1  x_1  23
1  x_3  27
1  x_4  28
1  x_3  29
1  x_4  30

Desired output

   Id Name Value
    1  x_1  23
    1  x_2  24
    1  x_3  27
    1  x_4  28
    1  x_3  29
    1  x_4  30

This removes x_3,x_4 rows as well which I want to keep: df.drop_duplicates(subset = ['Id', 'Name'],keep = 'first')


Solution

  • Let us use duplicated

    df[~(df.duplicated('Id') & df['Name'].eq('x_1'))]
    

       Id Name  Value
    0   1  x_1     23
    1   1  x_2     24
    3   1  x_3     27
    4   1  x_4     28
    5   1  x_3     29
    6   1  x_4     30