Search code examples
pandaspandas-groupbydrop-duplicates

Drop all group rows when met a condition?


I have pandas data frame have two-level group based on 'col10' and 'col1'.All I want to do is, drop all group rows if a specified value in another column repeated or this value did not existed in the group (keep the group which the specified value existed once only) for example:

  • The original data frame:

    df = pd.DataFrame( {'col0':['A','A','A','A','A','B','B','B','B','B','B','B','c'],'col1':[1,1,2,2,2,1,1,1,1,2,2,2,1], 'col2':[1,2,1,2,3,1,2,1,2,2,2,2,1]})

enter image description here

I need to keep the the rows for the group for example (['A',1],['A',2],['B',2]) in this original DF

  • The desired dataframe:

enter image description here

  • I tried this step:

    df.groupby(['col0','col1']).apply(lambda x: (x['col2']==1).sum()==1)

where the result is

col0  col1
A     1        True
      2        True
B     1       False
      2        True
c     1       False
dtype: bool

How to create the desired Df based on this bool?


Solution

  • You can do this as below:

    m=(df.groupby(['col0','col1'])['col2'].
         transform(lambda x: np.where((x.eq(1)).sum()==1,x,np.nan)).dropna().index)
    df.loc[m]
    

    Or:

    df[df.groupby(['col0','col1'])['col2'].transform(lambda x: x.eq(1).sum()==1)]
    

       col0  col1  col2
    0     A     1     1
    1     A     1     2
    2     A     2     1
    3     A     2     2
    4     A     2     3
    12    c     1     1