Search code examples
pythonmachine-learningscikit-learnknngridsearchcv

Getting best f1 score using Gridsearch


I'm currently running GridSearchCV to find the best hyperparameters for f1 scores.

from sklearn.metrics import f1_score, make_scorer

f1 = make_scorer(f1_score, average='micro')

grid = {'n_neighbors':np.arange(1,16),
        'p':np.arange(1,3),
        'weights':['uniform','distance'],
        'algorithm':['auto']
       }

knn = KNeighborsClassifier()
knn_cv = GridSearchCV(knn,grid,cv=3, verbose = 3, scoring= f1)
knn_cv.fit(X_train_res,y_train_res)

print("Hyperparameters:",knn_cv.best_params_)
print("Train Score:",knn_cv.best_score_)
result_train["GridSearch-Best-Train"] = knn_cv.best_score_

Although I've noticed that the best hyperparameters and best accuracy stays the same after using f1 as the scoring, am I using it wrongly?


Solution

  • This isn't too surprising, especially if your classes aren't too imbalanced. And I don't see anything wrong immediately with your code.

    To add some supporting evidence that things are working as expected, have a look at the knn_cv.cv_results_ for both scorers (probably easiest to inspect if you turn that dictionary into a pandas dataframe). In fact, you can specify more than one scorer, so that the cv_results_ attribute will show you both scores in the same dictionary/frame. You might want to throw in a more continuous score, like log-loss.