Search code examples
pythonpandassubsetdrop

remove rows where columns contain only NaN or Zero


I have the following list of dataframes. I need to remove from each df the rows that include only the values NaN or Zero. I cannot change all Zeros to NaN, as in other columns, they have a valid meaning rather than reflecting missing/not a number info. Ideally, i would like to combine the commands in this sort of format [x.dropna(axis=0, how='all') for x in dfs]. thank you!

data

df1 = pd.DataFrame(data={'id':[1,2,0,4,5,6], 
                         'a': [41,41,0,43,40,41], 
                         'b': [21,20,0,19,23,23],
                         'c': [0,0,0,0,43,0],
                         'd': [12,11,0,0,0,0]})

df2 = pd.DataFrame(data={'id':[0,2,0,4,5,6], 
                         'a': [0,41,0,43,40,41], 
                         'b': ['NaN',20,'NaN',19,23,23],
                         'c': [0,0,0,0,43,0],
                         'd': [0,11,0,0,0,0]})

df3 = pd.DataFrame(data={'id':[1,2,'NaN','NaN',5,0], 
                         'a': [41,41,0,43,40,0], 
                         'b': [21,20,0,19,23,0],
                         'c': [0,0,0,0,43,0],
                         'd': [12,11,0,0,0,0]})

dfs = [df1,df2,df3]

expected output

[   id   a   b   c   d
 0   1  41  21   0  12
 1   2  41  20   0  11
 2   4  43  19   0   0
 3   5  40  23  43   0
 4   6  41  23   0   0,
    id   a   b   c   d
 0   2  41  20   0  11
 1   4  43  19   0   0
 2   5  40  23  43   0
 3   6  41  23   0   0,
     id   a   b   c   d
 0    1  41  21   0  12
 1    2  41  20   0  11
 2  NaN  43  19   0   0
 3    5  40  23  43   0
 4    0   0   0   0   0]

Solution

  • You can replace 0 to missing values, but better is removed original DataFrames by this repalced one with tested all rows if exist at least one non NaN value in boolean indexing:

    dfs = [x[x.replace(0, np.nan).notna().any(axis=1)] for x in dfs]
    
    print (dfs)
    [   id   a   b   c   d
    0   1  41  21   0  12
    1   2  41  20   0  11
    3   4  43  19   0   0
    4   5  40  23  43   0
    5   6  41  23   0   0,    id   a     b   c   d
    1   2  41  20.0   0  11
    3   4  43  19.0   0   0
    4   5  40  23.0  43   0
    5   6  41  23.0   0   0,     id   a   b   c   d
    0  1.0  41  21   0  12
    1  2.0  41  20   0  11
    3  NaN  43  19   0   0
    4  5.0  40  23  43   0]
    

    If there are only positive values is possible test if sum is not 0:

    dfs = [x[x.sum(axis=1).ne(0)] for x in dfs]