I have the following list of dataframes. I need to remove from each df the rows that include only the values NaN or Zero. I cannot change all Zeros to NaN, as in other columns, they have a valid meaning rather than reflecting missing/not a number info. Ideally, i would like to combine the commands in this sort of format [x.dropna(axis=0, how='all') for x in dfs]. thank you!
data
df1 = pd.DataFrame(data={'id':[1,2,0,4,5,6],
'a': [41,41,0,43,40,41],
'b': [21,20,0,19,23,23],
'c': [0,0,0,0,43,0],
'd': [12,11,0,0,0,0]})
df2 = pd.DataFrame(data={'id':[0,2,0,4,5,6],
'a': [0,41,0,43,40,41],
'b': ['NaN',20,'NaN',19,23,23],
'c': [0,0,0,0,43,0],
'd': [0,11,0,0,0,0]})
df3 = pd.DataFrame(data={'id':[1,2,'NaN','NaN',5,0],
'a': [41,41,0,43,40,0],
'b': [21,20,0,19,23,0],
'c': [0,0,0,0,43,0],
'd': [12,11,0,0,0,0]})
dfs = [df1,df2,df3]
expected output
[ id a b c d
0 1 41 21 0 12
1 2 41 20 0 11
2 4 43 19 0 0
3 5 40 23 43 0
4 6 41 23 0 0,
id a b c d
0 2 41 20 0 11
1 4 43 19 0 0
2 5 40 23 43 0
3 6 41 23 0 0,
id a b c d
0 1 41 21 0 12
1 2 41 20 0 11
2 NaN 43 19 0 0
3 5 40 23 43 0
4 0 0 0 0 0]
You can replace 0 to missing values, but better is removed original DataFrames by this repalced one with tested all rows if exist at least one non NaN
value in boolean indexing
:
dfs = [x[x.replace(0, np.nan).notna().any(axis=1)] for x in dfs]
print (dfs)
[ id a b c d
0 1 41 21 0 12
1 2 41 20 0 11
3 4 43 19 0 0
4 5 40 23 43 0
5 6 41 23 0 0, id a b c d
1 2 41 20.0 0 11
3 4 43 19.0 0 0
4 5 40 23.0 43 0
5 6 41 23.0 0 0, id a b c d
0 1.0 41 21 0 12
1 2.0 41 20 0 11
3 NaN 43 19 0 0
4 5.0 40 23 43 0]
If there are only positive values is possible test if sum
is not 0
:
dfs = [x[x.sum(axis=1).ne(0)] for x in dfs]