I have a list of predictor (X) and an outcome (y) variables from my df. There are 100s of variables in my df so I only care about a few of them below.
X = df[['a', 'b', 'c']]
y = df['d']
I then want to delete all of the rows with missing data for any of my "X" variables, so I ran this:
for i in X:
df = df[df[i].notna()]
This then leaves me with a modified df with no missing values in the columns of interest. However, my list X and y are still populated with the old df, thus I can not use these as inputs to my model. While I know I could just copy and paste the code I used to create those lists in the first place to "refresh" the code, that seems inefficient. Though I can not seem to think of a better way. Thoughts appreciated!
You can use df.dropna
:
X = X.dropna()