Search code examples
pythonmodelevaluationpycaret

Early stopping with Pycaret? Overfitting with Catboost and XGBoost


I'm comparing the performance of Catboost, XGBoost and LinearRegression in Pycaret. Catboost and XGBoost are untuned.

So far I see that Catboost and XGBoost are overfitting.

enter image description here

For linear regression train/test-score is train R2: 0.72, test R2: 0.65

Is there a way to set a 'Early Stopping' for XGBoost and Catboost to avoid this overfit? Or is there other parameters to tune in Pycaret to avoid overfitting?


Solution

  • There exists more possibilities, how to avoid an overfit.

    • Feature Selection (cann be set up in the setup) - there are two types and variable threshold OR RFE (recursive feature elimination) or SHAP
    • tune the both - Catboost, XGBoost (or the other tree algorithms)
    • increase the n_estimators=100 or 500, or 1000
    • run the algorithms several times
    • change sampling 80/20, 70/30 etc.
    • remove correlated inputs