I happen to a this code that is creating a voting classifier and enabling to use the gridsearch on a set of tuples of classifiers to respectively compare those
clf1 = KNeighborsClassifier(n_neighbors=3)
clf2 = RandomForestClassifier(random_state=123)
clf3 = LogisticRegression(max_iter=1000)
clf4 = SVC()
#Voting Classifier
vclf = VotingClassifier(estimators=[('knn', clf1), ('rf', clf2), ('lr', clf3), ('svm', clf4)], voting='hard')
cv3 = KFold(n_splits=4, random_state=111, shuffle=True)
for clf, label in zip([clf1, clf2, clf3, clf4, vclf], ['KNN', 'Random Forest', 'Logistic Regression', 'Voting Classifier']):
scores = cross_validate(clf, X_train, y_train, cv=cv3, scoring=['accuracy','f1'])
print("[%s]: \n Accuracy: %0.2f (+/- %0.2f)" % (label, scores['test_accuracy'].mean(), scores['test_accuracy'].std()),
"F1 score: %0.2f (+/- %0.2f)" % (scores['test_f1'].mean(), scores['test_f1'].std()))
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
params = {'knn__n_neighbors': [5, 9],
'rf__n_estimators': [20, 100, 200],
'svm__C': [0.01, 0.1, 1],
'lr__C' : [0.01, 0.1, 1],
'estimators': [[('knn', clf1), ('lr', clf3)], [('knn', clf1), ('rf', clf2), ('svm', clf4)]]
grid = GridSearchCV(estimator=vclf, param_grid=params, cv=5)
grid = grid.fit(X_train, y_train)
That returns an error message saying:
dict_keys(['estimators', 'flatten_transform', 'n_jobs', 'verbose', 'voting', 'weights', 'knn', 'rf', 'lr', 'svm', 'knn__algorithm', 'knn__leaf_size', 'knn__metric', 'knn__metric_params', 'knn__n_jobs', 'knn__n_neighbors', 'knn__p', 'knn__weights', 'rf__bootstrap', 'rf__ccp_alpha', 'rf__class_weight', 'rf__criterion', 'rf__max_depth', 'rf__max_features', 'rf__max_leaf_nodes', 'rf__max_samples', 'rf__min_impurity_decrease', 'rf__min_samples_leaf', 'rf__min_samples_split', 'rf__min_weight_fraction_leaf', 'rf__n_estimators', 'rf__n_jobs', 'rf__oob_score', 'rf__random_state', 'rf__verbose', 'rf__warm_start', 'lr__C', 'lr__class_weight', 'lr__dual', 'lr__fit_intercept', 'lr__intercept_scaling', 'lr__l1_ratio', 'lr__max_iter', 'lr__multi_class', 'lr__n_jobs', 'lr__penalty', 'lr__random_state', 'lr__solver', 'lr__tol', 'lr__verbose', 'lr__warm_start', 'svm__C', 'svm__break_ties', 'svm__cache_size', 'svm__class_weight', 'svm__coef0', 'svm__decision_function_shape', 'svm__degree', 'svm__gamma', 'svm__kernel', 'svm__max_iter', 'svm__probability', 'svm__random_state', 'svm__shrinking', 'svm__tol', 'svm__verbose'])
ValueError Traceback (most recent call last)
ValueError: Invalid parameter rf for estimator VotingClassifier(estimators=[('knn', KNeighborsClassifier(n_neighbors=3)),
('lr', LogisticRegression(max_iter=1000))]). Check the list of available parameters with `estimator.get_params().keys()`.
But in looking at the parameter, the rf parameter exists and an array is entered. Could you guide me on that syntax ? Thank you very much for your answer !
In the parameters dict
, "estimators"
can be one of two possibilities
params = {
'estimators': [
[('knn', clf1), ('lr', clf3)], #option 1
[('knn', clf1), ('rf', clf2), ('svm', clf4)] #option 2
When option 1 is selected, there is no rf__n_samples
attribute as the estimator list doesn't include the random forest. The grid search tries to set n_samples
for rf
, and raises an error because that attribute doesn't exist.
What you could do is define two separate parameter dictionaries: one for each estimator configuration. Each parameter dictionary has the valid entries for that estimator:
params = [
#This dict is for the estimator=[(knn, lr)]
{'knn__n_neighbors': [5, 9],
'lr__C': [0.01, 0.1, 1],
'estimators': [ [('knn', clf1), ('lr', clf3)], ]
#This dict if for estimator=[(knn, rf, svm)]
{'knn__n_neighbors': [5, 9],
'rf__n_estimators': [20, 100, 200],
'svm__C': [0.01, 0.1, 1],
'estimators': [ [('knn', clf1), ('rf', clf2), ('svm', clf4)], ]
It now runs at my end without erroring.