As per the documentation of RandomizedSearchCV and GridSearchCV modules of sklearn, they support only the fit
method for the classifier which is passed to them and doesn't support the partial_fit
method of the classifiers which can be used for training on an incremental basis. Currently, I am trying to use SGDClassifier
which can be trained on incremental data using the partial_fit
method and also find the best set of hyper-parameters for the same. I was just wondering why doesn't RandomizedSearchCV
or GridSearchCV
support partial_fit
. I don't see any technical reasons as to why this cannot be done (please correct me if I am wrong here). Any leads will be really appreciated.
Yeah, technically you can write a GridSerachCV for partial_fit
as well, but when you think about
it becomes quite different from what we do with the .fit()
approach. Here is my list of reason for not having partial_fit
in GridsearchCV
/RandomSearchCV
.
what is that you are searching for?
When we optimize for the hyper parameters of a model for one batch of data, it could be sub-optimal for the final model (which is trained on complete data using multiple partial_fit
s). Now the problem becomes finding the best schedule for the hyper parameters i.e. what is the optimal value of the hyper parameter at each batch/time step. One example of this is the decaying learning rate in neural networks, where we train the model using multiple partial_fit
s and the hyper parameter - learning rate value would not be a single value but a series values that needs to be used for each time step/batch.
Also you need to loop through the entire dataset multiple times (multiple epochs) to know the best scheduling of the hyper parameters. This needs a basic API change for GridSearchCV.
what is that your are optimizing for?
partial_fit
s or reaching the sweet-spot quickly (in fewer batches) for usual metric (precision
, recall
, f1-score
, etc.), some combination of one and two. Hence, this also needs a API change for computing the single value for summarizing the performance of a model, which was trained using multiple partial_fit
s.