Search code examples
pythonscikit-learnpipeline

Extracting feature importances along with column names from sklearn pipeline


I have a sklearn pipeline with two steps (a columntransformer preprocessor with a One hot encoder and a randomforestregressor estimator). I would like to get the feature names of the encoded columns after One hot encoding. My pipeline looks like this.

categorical_preprocessor = OneHotEncoder(handle_unknown="ignore")

# Model processor
preprocessor = ColumnTransformer(
    [('categorical', categorical_preprocessor, categorical_columns)], remainder="passthrough")

est = RandomForestRegressor(
n_estimators=100, random_state=0)

pipe = make_pipeline(preprocessor,est)

I am trying to get the feature names of the encoded columns like this:

pipe['preprocessor'].transformers[0][0].get_feature_names(categorical_columns)

But I get an error.

'str' object has no attribute 'get_feature_names'


Solution

  • There is apparantly a new feature from scikit-learn 1.0 where we extract the feature names as:

    pipeline[:-1].get_feature_names_out()