I have a dataframe with a lot of NaNs.
y
columns mean the count of events, val
means values of each event in that yeat, and total
means a multiplication of both columns.
Many columns have zeros and many have NaNs because values
are not available (up to 80% of data is missing) is 4 columns.
y17 y18 y19 y20 val17 va18 val19 val20 total17 total18 total19 total20
1 2 1 2 2 2 2 2 1 4 2 4
2 2 2 2 2 2 2 2 4 4 4 4
3 3 3 3 NaN NaN NaN NaN NaN NaN NaN NaN
0 0 0 0 1 2 3 4 0 0 0 0
0 0 0 0 NaN NaN NaN NaN NaN NaN NaN NaN
I want to keep rows with all values with zeros and numbers AND I want to keep rows where first four columns (multiple condition) have zeros.
Expected output
y17 y18 y19 y20 val17 va18 val19 val20 total17 total18 total19 total20
1 2 1 2 2 2 2 2 1 4 2 4
2 2 2 2 2 2 2 2 4 4 4 4
0 0 0 0 1 2 3 4 0 0 0 0
0 0 0 0 NaN NaN NaN NaN NaN NaN NaN NaN
Thanks!
Just pass the condition with all
out = df[df.iloc[:,:4].eq(0).all(1) | df.notna().all(1)]
Out[386]:
y17 y18 y19 y20 val17 ... val20 total17 total18 total19 total20
0 1 2 1 2 2.0 ... 2.0 1.0 4.0 2.0 4.0
1 2 2 2 2 2.0 ... 2.0 4.0 4.0 4.0 4.0
3 0 0 0 0 1.0 ... 4.0 0.0 0.0 0.0 0.0
4 0 0 0 0 NaN ... NaN NaN NaN NaN NaN
[4 rows x 12 columns]