Search code examples
machine-learningpythonrandom-forest

Hyperparameter optimisation in Python with a separate validation set


I am trying to optimise the hyper parameters of a random forest regressor in Python.

I have 3 separate datasets: train/validate/test. Therefore, rather than using a cross validation method I want to use the specific validation set to tune the hyperparameters, i.e. the "First Approach" described in this stackoverflow post.

Now, sklearn has some nice inbuilt methods for hyperparameter optimisation using cross validation (e.g. this tutorial), but what about if I want to tune my hyperparameters with a specific validation set? Is it still possible to use a method like RandomizedSearchCV?


Solution

  • It is indeed possible with cv option. As the documentation suggests, one of the possible inputs is an iterable of train/test index tuples:

    An iterable yielding (train, test) splits as arrays of indices.

    So, a list of size one with train and validation indices packed as a tuple would be ok.