I have the following pandas dataframe and I would like a function that returns the ID's data with at least 1 True value in bool_1
, 2 True values in bool_2
and 3 True values in bool_3
column, using the groupby
function.
index ID bool_1 bool_2 bool_3
0 7 True True True
1 7 False True True
2 7 False False True
3 8 True True True
4 8 True True True
5 8 False False True
6 9 True True True
7 9 True False True
8 9 True False True
9 9 True False False
As output I would expect complete data for ID
7
and 8
to be returned, since 9
has only 1 True value for bool_2
.
Any idea for that function? Thank you!
You can specify number of True
s in dictionary, so possible compare by DataFrame.ge
for greate or equal count of True
s by aggregate sum and filter original DataFrame
by boolean indexing
with Series.isin
:
d = {'bool_1':1, 'bool_2':2,'bool_3':3}
ids = df.groupby('ID')[list(d.keys())].sum().ge(d).all(axis=1)
print (ids)
ID
7 True
8 True
9 False
dtype: bool
out = df[df['ID'].isin(ids.index[ids])]
print (out)
index ID bool_1 bool_2 bool_3
0 0 7 True True True
1 1 7 False True True
2 2 7 False False True
3 3 8 True True True
4 4 8 True True True
5 5 8 False False True
Another idea is use GroupBy.transform
for create boolean mask:
d = {'bool_1':1, 'bool_2':2,'bool_3':3}
mask = df.groupby('ID')[list(d.keys())].transform('sum').ge(d).all(axis=1)
print (mask)
0 True
1 True
2 True
3 True
4 True
5 True
6 False
7 False
8 False
9 False
dtype: bool
out = df[mask]
print (out)
index ID bool_1 bool_2 bool_3
0 0 7 True True True
1 1 7 False True True
2 2 7 False False True
3 3 8 True True True
4 4 8 True True True
5 5 8 False False True