Search code examples
pythonpandasmachine-learningevaluationgridsearchcv

How to pass a Dataframe as train dataframe and another dataframe as Validation to GridSearchCV


I'm a programmer who tries to find he's way into ML world. so the Question might be basic.

i have data from years 2010-2019. Now i'm trying to test different parameters on gradient boosting regression and i want to use 60% for traning,20% for Validation and 20% for Testing. Due to the nature of the Question that i'm trying to answer. I have already splitted the data into Train_df from 2010 till 2014 ,evaluate_df 2015 till 2017, test_df from 2018-2019.

model should be trained on trained_df, and evaluated on evaluate_df, finally i use the best model for Test dataframe test_df.

This is my code:

p_test3 = {'learning_rate':[0.1,0.05,0.01,0.005], 'n_estimators':[500,750,1000,1250,1500]}

tuning = GridSearchCV(estimator =GradientBoostingRegressor( min_samples_split=2, min_samples_leaf=1, subsample=1,max_features='sqrt', random_state=10), 
            param_grid = p_test3, scoring='r2',n_jobs=-1, cv=evaluate_df)
tuning.fit(train_df[[col1]],train_df['col2'])
tuning.cv_results_, tuning.best_params_, tuning.best_score_

but i got this error:

ValueError: too many values to unpack (expected 2)

How can i test the model of GridSearchCV on a dataframe?


Solution

  • 2 dataframes should be combined and then new list has to be generated containing 0 for trainig and 1 for testing. then pass it to cv.

    combined_df=pd.concat([train_df,evaluate_df])
    test_fold = [0] * len(train_df) + [1] * len(evaluate_df)
    
    
    
    p_test3 = {'learning_rate':[0.1,0.05,0.01,0.005], 'n_estimators':[500,750,1000,1250,1500]}
    ps = PredefinedSplit(test_fold=test_fold)
    tuning = GridSearchCV(estimator =GradientBoostingRegressor( min_samples_split=2, min_samples_leaf=1, subsample=1,max_features='sqrt', random_state=10), 
            param_grid = p_test3, scoring='r2',n_jobs=-1, cv=ps)