Search code examples

Early stopping with Pycaret? Overfitting with Catboost and XGBoost

I'm comparing the performance of Catboost, XGBoost and LinearRegression in Pycaret. Catboost and XGBoost are untuned.

So far I see that Catboost and XGBoost are overfitting.

enter image description here

For linear regression train/test-score is train R2: 0.72, test R2: 0.65

Is there a way to set a 'Early Stopping' for XGBoost and Catboost to avoid this overfit? Or is there other parameters to tune in Pycaret to avoid overfitting?


  • There exists more possibilities, how to avoid an overfit.

    • Feature Selection (cann be set up in the setup) - there are two types and variable threshold OR RFE (recursive feature elimination) or SHAP
    • tune the both - Catboost, XGBoost (or the other tree algorithms)
    • increase the n_estimators=100 or 500, or 1000
    • run the algorithms several times
    • change sampling 80/20, 70/30 etc.
    • remove correlated inputs