Search code examples
pythonmachine-learningrefresh

Updating a list of df variables after modifying a df


I have a list of predictor (X) and an outcome (y) variables from my df. There are 100s of variables in my df so I only care about a few of them below.

X = df[['a', 'b', 'c']]

y = df['d']

I then want to delete all of the rows with missing data for any of my "X" variables, so I ran this:

for i in X:

    df = df[df[i].notna()]

This then leaves me with a modified df with no missing values in the columns of interest. However, my list X and y are still populated with the old df, thus I can not use these as inputs to my model. While I know I could just copy and paste the code I used to create those lists in the first place to "refresh" the code, that seems inefficient. Though I can not seem to think of a better way. Thoughts appreciated!


Solution

  • You can use df.dropna:

    X = X.dropna()