Search code examples
pythonpandasrows

drop all rows when all cells are either zero or nan


I need to remove all rows from a dataframe when all cells in each column for that row are either 0 or 'nan' (other than ID column which will be some unique value) I have tried the command below but no luck. The expected output is also pasted below. Thank you in advance!

data

df = pd.DataFrame({'id':[1,2,3,4],'a':[0,0,0,1], 'b':[0,1,'nan',1], 'c':['nan',1,0,1]})

    id  a   b   c
0   1   0   0   nan
1   2   0   1   1
2   3   0   nan 0
3   4   1   1   1

command

df.loc[df[(df != 0) & (df != ('nan'))].all(axis=1)]

expected output

    id  a   b   c
0   2   0   1   1
1   4   1   1   1

Solution

  • Your solution should be changed by select all columns without first with DataFrame.all:

    df1 = df.iloc[:, 1:]
    df2 = df[(df1 != 0).all(axis=1) | (df1 != 'nan').all(axis=1)]
    

    Working like DataFrame.isin:

    df2 = df[~df.iloc[:, 1:].isin([0, 'nan']).all(axis=1)]
    

    If NaN are missing values is necessary test them by DataFrame.notna:

    df1 = df.iloc[:, 1:]
    df2 = df[(df1 != 0).all(axis=1) | df1.notna().all(axis=1)]
    

    Or:

    df2 = df[~df.iloc[:, 1:].fillna('nan').isin([0, 'nan']).all(axis=1)]