Search code examples
pythonpandasscikit-learnxgboost

using fit_params from pipeline sklearn for training


I am using XGBClassifier from xgboost library in a Pipeline from sklearn but whenever i want to access one of the **fit_params in the way that the library says to do so https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline.fit i got keyerrors

xgb_model = XGBClassifier(eval_metric='logloss', use_label_encoder=False)
pipeline = Pipeline([("preproc", preprocesser), ("classifier", xgb_model)])
pipeline.fit(
    X_train, y_train, train_model__eval_set=[(X_valid_transformed, y_valid)]
)

i got

Keyerror: 'train_model'

Solution

  • From the sklearn.pipeline docs:

    ...
    **fit_paramsdict of string -> object
      Parameters passed to the fit method of each step,
      where each parameter name is prefixed such that
      parameter p for step s has key s__p
    ...
    

    So, for your code, you need:

                                                          |
                                                          |
                                                          v
                                                       ________
                                                      |        |
    pipeline = Pipeline([("preproc", preprocesser), ("classifier", xgb_model)])
    pipeline.fit(
        X_train, y_train, classifier__eval_set=[(X_valid_transformed, y_valid)]
    )                     |________|
                              ^
                              |
                              |