I have pandas data frame have two-level group based on 'col10
' and 'col1
'.All I want to do is, drop all group rows if a specified value in another column repeated or this value did not existed in the group (keep the group which the specified value existed once only) for example:
The original data frame:
df = pd.DataFrame( {'col0':['A','A','A','A','A','B','B','B','B','B','B','B','c'],'col1':[1,1,2,2,2,1,1,1,1,2,2,2,1], 'col2':[1,2,1,2,3,1,2,1,2,2,2,2,1]})
I need to keep the the rows for the group for example (['A',1],['A',2],['B',2])
in this original DF
I tried this step:
df.groupby(['col0','col1']).apply(lambda x: (x['col2']==1).sum()==1)
where the result is
col0 col1
A 1 True
2 True
B 1 False
2 True
c 1 False
dtype: bool
How to create the desired Df based on this bool?
You can do this as below:
m=(df.groupby(['col0','col1'])['col2'].
transform(lambda x: np.where((x.eq(1)).sum()==1,x,np.nan)).dropna().index)
df.loc[m]
Or:
df[df.groupby(['col0','col1'])['col2'].transform(lambda x: x.eq(1).sum()==1)]
col0 col1 col2
0 A 1 1
1 A 1 2
2 A 2 1
3 A 2 2
4 A 2 3
12 c 1 1