Search code examples
machine-learningscikit-learnclassificationgrid-search

How do you use fit_params for RandomizedSearch with VotingClassifier in Sklearn?


Hi I'm trying to use fit_params (for sample_weight on GradientBoostingClassifier) for RandomizedSearch with VotingClassifier in Sklearn since the dataset is unbalanced. Could someone give me advice and possibly code sample?

My current-not-working code is below:

random_search = RandomizedSearchCV(my_votingClassifier, param_distributions=param_dist,
                                   n_iter=n_iter_search, n_jobs=-1, fit_params={'sample_weight':y_np_array})

Error:

TypeError: fit() got an unexpected keyword argument 'sample_weight'

Solution

  • Taking into account that there doesn't seem to be a direct way to pass sample_weight parameter through the VotingClassifier I came across this little "hack":

    Override the fit method of the classifiers at the bottom. For example, if you are using a DecisionTreeClassifier you could override its fit method by passing through the desired sample_weight parameter.

    class MyDecisionTreeClassifier(DecisionTreeClassifier):
        def fit(self, X , y = None):
            return super(DecisionTreeClassifier, self).fit(X,y,sample_weight=y)
    

    Now in your ensemble of classifiers in your VotingClassifier you can use your own MyDecisionTreeClassifier.

    Full working example:

    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.ensemble import VotingClassifier
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.grid_search import RandomizedSearchCV
    
    X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
    y = np.array([1, 1, 1, 2, 2, 2])
    
    class MyDecisionTreeClassifier(DecisionTreeClassifier):
        def fit(self, X , y = None):
            return super(DecisionTreeClassifier, self).fit(X,y,sample_weight=y)
    
    clf1 = MyDecisionTreeClassifier()
    clf2 = RandomForestClassifier() 
    params = {'dt__max_depth': [5, 10],'dt__max_features':[1,2]} 
    eclf = VotingClassifier(estimators=[('dt', clf1), ('rf', clf2)], voting='hard')
    random_search = RandomizedSearchCV(eclf, param_distributions=params,n_iter=4)
    random_search.fit(X, y)
    print(random_search.grid_scores_)
    print(random_search.best_score_)