Search code examples
machine-learningscikit-learnsklearn-pandas

Is it possible to change pandas column data type within a sklearn pipeline?


Sklearn pipeline I am using has multiple transformers but one of the initial transformers returns numerical type and the consecutive one takes object type variables.

Basically I need squeeze in a:

data[col] = data[col].astype(object)

for the required columns within the pipeline.

Is there any way to do it?

Note: I am using Feature-engine transformers.


Solution

  • Yes, you can use a sklearn.preprocessing.FunctionTransformer. A simple example would be,

    def to_object(x):
      return pd.DataFrame(x).astype(object)
    
    fun_tr = FunctionTransformer(to_object)
    
    y = fun_tr.fit_transform(pd.DataFrame({'a':[1,2,3]}))