I am working on figuring out how to count number of specific columns that match certain conditions and filtering any above 1. I am thinking it would require adding an additional column that includes the count()
, and then a separate step of seeing if the count()
is greater than 1. Finally, removing the unnecessary count()
column.
For the example below, I am interested in only colB, colC, colD, and colE greater than 0.
The example attempted code I have tried looks like:
#Step a
filtData['countCol'] = filtData[(filtData['colB']>0) & (filtData['colC']>0) &
(filtData['colD']>0) & (filtData['colE']>0)].count()
#Step b
filtData['countCol'] = filtData[filtData['countCol'] > 1]
#Step c
filtData = filtData.drop(columns=['countCol'])
Input:
colA colB colC colD colE colF
0 1105 0.00 867 3.4 0.00 text1
1 1106 3 3.22 1 3 text2
2 1107 0.5 0 0 1 text3
3 1110 0 23 0 0 text4
4 1019 9 0.0 2 0 text5
5 1267 0 0.0 0 2 text6
Output Step a:
colA colB colC colD colE colF countCol
0 1105 0.00 867 3.4 0.00 text1 2
1 1106 3 3.22 1 3 text2 4
2 1107 0.5 0 0 1 text3 2
3 1110 0 23 0 0 text4 1
4 1019 9 0.0 2 0 text5 2
5 1267 0 0.0 0 2 text6 1
Output Step b:
colA colB colC colD colE colF countCol
0 1105 0.00 867 3.4 0.00 text1 2
1 1106 3 3.22 1 3 text2 4
2 1107 0.5 0 0 1 text3 2
4 1019 9 0.0 2 0 text5 2
Output Step c:
colA colB colC colD colE colF
0 1105 0.00 867 3.4 0.00 text1
1 1106 3 3.22 1 3 text2
2 1107 0.5 0 0 1 text3
4 1019 9 0.0 2 0 text5
If there is a way to perform this in one step that is elegant (isn't too advanced of an expression to understand), that would be ideal. I am still learning pandas, so maybe performing the filtering I am looking for would need to be broken in the three sub-steps shown.
You can filter columns by list, comapre by 0
abd count True
s by sum
, last filter rows greater like 1
in boolean indexing
:
out = filtData[(filtData[['colB','colC','colD','colE']]>0).sum(axis=1) > 1]
Or is possible use DataFrame.gt
for greater
:
out = filtData[filtData[['colB','colC','colD','colE']].gt(0).sum(axis=1).gt(1)]
print (out)
colA colB colC colD colE colF
0 1105 0.0 867.00 3.4 0.0 text1
1 1106 3.0 3.22 1.0 3.0 text2
2 1107 0.5 0.00 0.0 1.0 text3
4 1019 9.0 0.00 2.0 0.0 text5
How it working:
print (filtData[['colB','colC','colD','colE']].gt(0))
colB colC colD colE
0 False True True False
1 True True True True
2 True False False True
3 False True False False
4 True False True False
5 False False False True
print (filtData[['colB','colC','colD','colE']].gt(0).sum(axis=1))
0 2
1 4
2 2
3 1
4 2
5 1
dtype: int64
print (filtData[['colB','colC','colD','colE']].gt(0).sum(axis=1).gt(1))
0 True
1 True
2 True
3 False
4 True
5 False
dtype: bool