Search code examples
pythonpandasdataframegroup-by

Drop group if another column only has duplicate values - pandas dataframe


My question is quite similar to this one: Drop group if another column has duplicate values - pandas dataframe

I have the following dataframe:

letter  value   many    other   variables
A          5            
A          5            
A          8            
A          9            
B          3            
B         10            
B         10            
B          4            
B          5            
B          9            
C         10            
C         10            
C         10            
D          6            
D          8            
D         10            
E          5            
E          5            
E          5            
F          4            
F          4            

And when grouping it by letter I want to remove all the resulting groups that only have value that repeats, thus getting a result like this:

letter      value   many    other   variables
A            5          
A            5          
A            8          
A            9          
B            3          
B           10          
B           10          
B            4          
B            5          
B            9          
D            6          
D            8          
D           10          

I am afraid that if I use the duplicate() function similarly to the question I mentioned at the beggining I would be deleting groups (or the rows in) 'A' and 'B' which should rather stay in their place.


Solution

  • You have several possibilities.

    Using duplicated and groupby.transform:

    m = (df.duplicated(subset=['letter', 'value'], keep=False)
           .groupby(df['letter']).transform('all')
        )
    
    out = df[~m]
    

    NB. this won't drop groups with a single row.

    Using groupby.transform and nunique:

    out = df[df.groupby('letter')['value'].transform('nunique').gt(1)]
    

    NB. this will drop groups with a single row.

    Output:

       letter  value  many  other  variables
    0       A      5   NaN    NaN        NaN
    1       A      5   NaN    NaN        NaN
    2       A      8   NaN    NaN        NaN
    3       A      9   NaN    NaN        NaN
    4       B      3   NaN    NaN        NaN
    5       B     10   NaN    NaN        NaN
    6       B     10   NaN    NaN        NaN
    7       B      4   NaN    NaN        NaN
    8       B      5   NaN    NaN        NaN
    9       B      9   NaN    NaN        NaN
    13      D      6   NaN    NaN        NaN
    14      D      8   NaN    NaN        NaN
    15      D     10   NaN    NaN        NaN