Search code examples
pythonscikit-learnregressionscoringgridsearchcv

How to set own scoring with GridSearchCV from sklearn for regression?


I used to use GridSearchCV(...scoring="accuracy"...) for classification model. and now I am about to use GridSearchCV for the regression model and set scoring with own error function.

Example code:

def rmse(predict, actual):
    predict = np.array(predict)
    actual = np.array(actual)

    distance = predict - actual

    square_distance = distance ** 2

    mean_square_distance = square_distance.mean()

    score = np.sqrt(mean_square_distance)

    return score

rmse_score = make_scorer(rmse)

gsSVR = GridSearchCV(...scoring=rmse_score...)
gsSVR.fit(X_train,Y_train)
SVR_best = gsSVR.best_estimator_
print(gsSVR.best_score_)

However, I found it this way return parameter set when the error score is the highest. as a result, I got the worst parameter set and score. In this case, how can I get the best estimator and score?

summary:

classification -> GridSearchCV(scoring="accuracy") -> best_estimaror...best

regression -> GridSearchCV(scroing=rmse_score) -> best_estimator...worst


Solution

  • That is a technically a loss where lower is better. You can turn that option on in make_scorer:

    greater_is_better : boolean, default=True Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.

    You also need to change the order of inputs from rmse(predict, actual) to rmse(actual, predict) because thats the order GridSearchCV will pass them. So the final scorer will look like this:

    def rmse(actual, predict):
    
        ...
        ...
        return score
    
    rmse_score = make_scorer(rmse, greater_is_better = False)