Search code examples
pythonpandasnandrop

Drop rows with NaNs from pandas dataframe based on multiple conditions


I have a dataframe with a lot of NaNs.

y columns mean the count of events, val means values of each event in that yeat, and total means a multiplication of both columns.

Many columns have zeros and many have NaNs because values are not available (up to 80% of data is missing) is 4 columns.

y17 y18 y19 y20 val17 va18 val19 val20 total17 total18 total19 total20
 1   2   1   2    2    2     2    2       1      4        2      4
 2   2   2   2    2    2     2    2       4      4        4      4
 3   3   3   3   NaN  NaN   NaN  NaN     NaN    NaN      NaN    NaN
 0   0   0   0    1    2     3    4       0      0        0      0
 0   0   0   0   NaN  NaN   NaN  NaN     NaN    NaN      NaN    NaN

I want to keep rows with all values with zeros and numbers AND I want to keep rows where first four columns (multiple condition) have zeros.

Expected output

y17 y18 y19 y20 val17 va18 val19 val20 total17 total18 total19 total20
 1   2   1   2    2    2     2    2       1      4        2      4
 2   2   2   2    2    2     2    2       4      4        4      4
 0   0   0   0    1    2     3    4       0      0        0      0
 0   0   0   0   NaN  NaN   NaN  NaN     NaN    NaN      NaN    NaN

Thanks!


Solution

  • Just pass the condition with all

    out = df[df.iloc[:,:4].eq(0).all(1) | df.notna().all(1)]
    Out[386]: 
       y17  y18  y19  y20  val17  ...  val20  total17  total18  total19  total20
    0    1    2    1    2    2.0  ...    2.0      1.0      4.0      2.0      4.0
    1    2    2    2    2    2.0  ...    2.0      4.0      4.0      4.0      4.0
    3    0    0    0    0    1.0  ...    4.0      0.0      0.0      0.0      0.0
    4    0    0    0    0    NaN  ...    NaN      NaN      NaN      NaN      NaN
    [4 rows x 12 columns]