I used RandomizedSearchCV (RSCV) with the default 5-fold CV for LGBMClassifier with an evaluation set.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
model_LGBM=LGBMClassifier(objective='binary',metric='auc',random_state=0,early_stopping_round=100)
distributions = dict(max_depth=range(1,10),
num_leaves=[50,100,150],
learning_rate=[0.1,0.2,0.3],
)
clf = RandomizedSearchCV(model_LGBM, distributions, random_state=0,n_iter=100,verbose=10)
clf.fit(X_train,y_train,eval_set=(X_test,y_test))
So the output of the RSCV looks like:
First iter: CV 1/5, "valid0's" CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
Second iter: CV 1/5 "valid0's", CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
...
Last iter: CV 1/5 "valid0's", CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
+1 fit with "valid0's"
I suppose the last fit is the refitted best estimator. Does it use the whole training set? Where does it use the evaluation set?
According to the docs (present here), if the refit
parameter is True
(which it is by default) the model get trained at the end using the best parameters found on the entire dataset (train data in this case) inputted.