Search code examples
machine-learningxgboosthyperparameterstpot

TPOT for hyperparameter tuning


I want to used TPOT for hyperparameter tunning of model. I know that TPOT can give me best machine learning pipeline with best hyperparameter. But in my case I have pipeline and I want to just tune its parameter

my pipeline is as follow

exported_pipeline = make_pipeline(
    StackingEstimator(estimator=SGDRegressor(alpha=0.001, eta0=0.1, fit_intercept=False, l1_ratio=1.0, learning_rate="constant", loss="epsilon_insensitive", penalty="elasticnet", power_t=10.0)),
    SelectPercentile(score_func=f_regression, percentile=90),
    OneHotEncoder(minimum_fraction=0.2, sparse=False, threshold=10),
    XGBRegressor(learning_rate=0.1, max_depth=10, min_child_weight=1, n_estimators=100, n_jobs=1, objective="reg:squarederror", subsample=0.45, verbosity=0)

please tell me way to do tunning of hyperparameter and if it is not possible in TPOT please tell some other possible alternative library for this. Thank you


Solution

    1. TPOT optimizes pipelines and hyperparams together. Since it is using genetic algorithm, you can run it several times with different random seeds to see if there is a better [pipeline with set of hyperparameters] together. Or, use different population settings

    2. If you don't want the pipeline to change. Import that in Sklearn and use something similar to TPOT. You can tune hyperparameters in Sklearns with Pipelines easily

    Here is an example: https://medium.com/@kocur4d/hyper-parameter-tuning-with-pipelines-5310aff069d6 search for (ctrl F) "grid_params" and see how it is configurated -- and, you can even export the tune grid from TPOT to your pipeline

    If the pipeline is not big ( and you have tune dictionaries ) use GridSearchCV.

    If the pipeline is big or the hyperparameter space have a lot of options, maybe use https://sklearn-nature-inspired-algorithms.readthedocs.io/en/latest/introduction/nature-inspired-search-cv.html (NaturalInspiredSearchCV) this has similar grammar, and can use the 'runs' to configure parallel training. You can also modify the population settings to avoid it sink into local critical points.