Search code examples
pythonoopmachine-learningscikit-learngridsearchcv

How to implement sklearn's Estimator interface for use in GridSearchCV pipeline?


So I have my own implementation of a Perceptron classifier, and would like to tune its hyperparameters using sklearn's GridSearchCV. I've been trying to write a wrapper around the model that implements Estimator (read through https://scikit-learn.org/stable/developers/develop.html) but when I run GridSearchCV(wrapper, params).fit(X,y), I get the following error:

FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
AttributeError: 'NoneType' object has no attribute 'fit'

  FitFailedWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 738, in fit
    **self.best_params_))
  File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\base.py", line 67, in clone
    % (repr(estimator), type(estimator)))
TypeError: Cannot clone object 'None' (type <class 'NoneType'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

This error is identical to How to write a custom estimator in sklearn and use cross-validation on it? but I was already doing everything proposed in the top-rated comment.

I am positive that the model is correct. Here is code for the wrapper around the model:

from models import Perceptron, Softmax, SVM
from sklearn.model_selection import GridSearchCV

class Estimator():
    def __init__(self, alpha=0.5, epochs=100):
        self.alpha = alpha
        self.epochs = epochs
        self.model = Perceptron()

    def fit(self, X, y, **kwargs):
        self.alpha = kwargs['alpha']
        self.epochs = kwargs['epochs']
        self.model.alpha = kwargs['alpha']
        self.model.epochs = kwargs['epochs']
        self.model.train(X, y)

    def predict(self, X):
        return self.model.predict(X)

    def score(self, data, targets):
        return self.model.get_acc(self.predict(data), targets)

    def set_params(self, alpha, epochs):
        self.alpha = alpha
        self.epochs = epochs
        self.model.alpha = alpha
        self.model.epochs = epochs

    def get_params(self, deep=False):
        return {'alpha':self.alpha, 'epochs':self.epochs}


Solution

  • As explained in this section of the documentation, you should derive the class from BaseEstimator class Estimator(BaseEstimator): to avoid boilerplate and follow a fit predict structure. As @Shihab said in the comments, your fit function is missing a return self code line.

    Also, in the get_params() I don't know if you are doing it on purpose, but the parameter deep in the documentation is also recommended to put it as deep=True by default. Please check it out.