I need to remove all rows from a dataframe when all cells in each column for that row are either 0 or 'nan' (other than ID column which will be some unique value) I have tried the command below but no luck. The expected output is also pasted below. Thank you in advance!
data
df = pd.DataFrame({'id':[1,2,3,4],'a':[0,0,0,1], 'b':[0,1,'nan',1], 'c':['nan',1,0,1]})
id a b c
0 1 0 0 nan
1 2 0 1 1
2 3 0 nan 0
3 4 1 1 1
command
df.loc[df[(df != 0) & (df != ('nan'))].all(axis=1)]
expected output
id a b c
0 2 0 1 1
1 4 1 1 1
Your solution should be changed by select all columns without first with DataFrame.all
:
df1 = df.iloc[:, 1:]
df2 = df[(df1 != 0).all(axis=1) | (df1 != 'nan').all(axis=1)]
Working like DataFrame.isin
:
df2 = df[~df.iloc[:, 1:].isin([0, 'nan']).all(axis=1)]
If NaN
are missing values is necessary test them by DataFrame.notna
:
df1 = df.iloc[:, 1:]
df2 = df[(df1 != 0).all(axis=1) | df1.notna().all(axis=1)]
Or:
df2 = df[~df.iloc[:, 1:].fillna('nan').isin([0, 'nan']).all(axis=1)]