I am taking my first steps with scikit library and found myself in need of backfilling only some columns in my data frame.
I have read carefully the documentation but I still cannot figure out how to achieve this.
To make this more specific, let's say I have:
A = [[7,2,3],[4,np.nan,6],[10,5,np.nan]]
And that I would like to fill in the second column with the mean but not the third. How can I do this with SimpleImputer (or another helper class)?
An evolution from this, and the natural follow up questions is: how can I fill the second column with the mean and the last column with a constant (only for cells that had no values to begin with, obviously)?
There is no need to use the SimpleImputer.
DataFrame.fillna()
can do the work as well
For the second column, use
column.fillna(column.mean(), inplace=True)
For the third column, use
column.fillna(constant, inplace=True)
Of course, you will need to replace column
with your DataFrame's column you want to change and constant
with your desired constant.
Edit
Since the use of inplace
is discouraged and will be deprecated, the syntax should be
column = column.fillna(column.mean())