I have a dataframe with 9 columns (long cryptic column names) and ~1000 rows. I want to drop/delete merely the rows where the values in column 2 and 3 are NaN.
I know there is DataFrame.dropna(subset=[])
but I don't want to select the columns by label but by position.
Is there a way to do it? Renaming the columns is no option.
SOLUTION for me:
df.dropna(subset=df.columns[[2,3]], how='all')
EDIT: This are some rows of my dataframe: (the important columns are Temp. and Event)
Datum Zeit Zeitzone \
Anz.
926 2019-09-17 09:00:00 GMT+02:00
927 2019-09-17 10:00:00 GMT+02:00
928 2019-09-17 11:00:00 GMT+02:00
929 2019-09-17 12:00:00 GMT+02:00
930 2019-09-17 13:00:00 GMT+02:00
931 2019-09-17 13:10:32 GMT+02:00
932 2019-09-17 13:11:21 GMT+02:00
933 2019-09-17 13:11:41 GMT+02:00
934 2019-09-17 13:11:42 GMT+02:00
935 2019-09-17 13:11:53 GMT+02:00
Temp., °C (LGR S/N: 20009596, SEN S/N: 20009596, LBL: 1m) \
Anz.
926 12.401
927 13.942
928 15.760
929 15.569
930 15.187
931 NaN
932 NaN
933 NaN
934 NaN
935 NaN
Event, units (LGR S/N: 20009596, SEN S/N: 20009596, LBL: 0,2 mm Niederschlag) \
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 163.0
932 NaN
933 NaN
934 NaN
935 163.0
Notiz Koppler verbunden (LGR S/N: 20009596) \
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 NaN
932 Protokolliert
933 NaN
934 Protokolliert
935 NaN
Koppler abgetrennt (LGR S/N: 20009596) \
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 NaN
932 NaN
933 Protokolliert
934 NaN
935 NaN
Host verbunden (LGR S/N: 20009596) Angehalten (LGR S/N: 20009596) \
Anz.
926 NaN NaN
927 NaN NaN
928 NaN NaN
929 NaN NaN
930 NaN NaN
931 NaN NaN
932 NaN NaN
933 NaN NaN
934 Protokolliert NaN
935 NaN Protokolliert
Dateiende (LGR S/N: 20009596)
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 NaN
932 NaN
933 NaN
934 NaN
935 Protokolliert
And this is how I want it to be (I left the index how it was to make it more visible):
Datum Zeit Zeitzone \
Anz.
926 2019-09-17 09:00:00 GMT+02:00
927 2019-09-17 10:00:00 GMT+02:00
928 2019-09-17 11:00:00 GMT+02:00
929 2019-09-17 12:00:00 GMT+02:00
930 2019-09-17 13:00:00 GMT+02:00
931 2019-09-17 13:10:32 GMT+02:00
935 2019-09-17 13:11:53 GMT+02:00
Temp., °C (LGR S/N: 20009596, SEN S/N: 20009596, LBL: 1m) \
Anz.
926 12.401
927 13.942
928 15.760
929 15.569
930 15.187
931 NaN
935 NaN
Event, units (LGR S/N: 20009596, SEN S/N: 20009596, LBL: 0,2 mm Niederschlag) \
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 163.0
935 163.0
Notiz Koppler verbunden (LGR S/N: 20009596) \
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 NaN
935 NaN
Koppler abgetrennt (LGR S/N: 20009596) \
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 NaN
935 NaN
Host verbunden (LGR S/N: 20009596) Angehalten (LGR S/N: 20009596) \
Anz.
926 NaN NaN
927 NaN NaN
928 NaN NaN
929 NaN NaN
930 NaN NaN
931 NaN NaN
935 NaN Protokolliert
Dateiende (LGR S/N: 20009596)
Anz.
926 NaN
927 NaN
928 NaN
929 NaN
930 NaN
931 NaN
935 Protokolliert
Instead of deleting the rows that you do not want, try keeping those that you want:
df[df.iloc[:,[2,3]].notnull().all(axis=1)]
But what is wrong with getting the column names by index?
df.dropna(subset=df.columns[[2,3]])