Search code examples
pythonpandasscikit-learndata-scienceimputation

How to transform some columns only with SimpleImputer or equivalent


I am taking my first steps with scikit library and found myself in need of backfilling only some columns in my data frame.

I have read carefully the documentation but I still cannot figure out how to achieve this.

To make this more specific, let's say I have:

A = [[7,2,3],[4,np.nan,6],[10,5,np.nan]]

And that I would like to fill in the second column with the mean but not the third. How can I do this with SimpleImputer (or another helper class)?

An evolution from this, and the natural follow up questions is: how can I fill the second column with the mean and the last column with a constant (only for cells that had no values to begin with, obviously)?


Solution

  • There is no need to use the SimpleImputer.
    DataFrame.fillna() can do the work as well

    • For the second column, use

      column.fillna(column.mean(), inplace=True)

    • For the third column, use

      column.fillna(constant, inplace=True)

    Of course, you will need to replace column with your DataFrame's column you want to change and constant with your desired constant.


    Edit
    Since the use of inplace is discouraged and will be deprecated, the syntax should be

    column = column.fillna(column.mean())