I am working on a supervised machine learning algorithm and it seems to have a curious behavior. So, let me start:
I have a function where I pass different classifiers, their parameters, training data and their labels:
def HT(targets,train_new, algorithm, parameters):
#creating my scorer
scorer=make_scorer(f1_score)
#creating the grid search object with the parameters of the function
grid_search = GridSearchCV(algorithm,
param_grid=parameters,scoring=scorer, cv=5)
# fit the grid_search object to the data
grid_search.fit(train_new, targets.ravel())
# print the name of the classifier, the best score and best parameters
print algorithm.__class__.__name__
print('Best score: {}'.format(grid_search.best_score_))
print('Best parameters: {}'.format(grid_search.best_params_))
# assign the best estimator to the pipeline variable
pipeline=grid_search.best_estimator_
# predict the results for the training set
results=pipeline.predict(train_new).astype(int)
print results
return pipeline
To this function I pass parameters like:
clf_param.append( {'C' : np.array([0.001,0.01,0.1,1,10]),
'kernel':(['linear','rbf']),
'decision_function_shape' : (['ovr'])})
Ok, so here is where things start to get strange. This functions is returning a f1_score but it is different from the score I am computing manually using the formula: F1 = 2 * (precision * recall) / (precision + recall)
There are pretty big differences (0.68 compared with 0.89)
I am doing something wrong in the function ? The score computed by grid_search (grid_search.best_score_) should be the same with the score on the whole training set (grid_search.best_estimator_.predict(train_new)) ? Thanks
The score that you are manually calculating takes into account the global true positives and negatives for all classes. But in scikit, f1_score, the default approach is to calculate the binary average (i.e only for the positive class).
So, in order to achieve the same scores, use the f1_score as specified below:
scorer=make_scorer(f1_score, average='micro')
Or simply, in the gridSearchCV, use:
scoring = 'f1_micro'
More information about how the averaging of scores is done is given on: - http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values
You may also want to take a look at the following answer which describes the calculation of scores in scikit in detail:-
EDIT: Changed macro to micro. As written in documentation:
'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.