Search code examples
pythonpython-3.xscikit-learnsvmgridsearchcv

target scaling using GridSearchCV


For hyperparameter tuning, I use the function GridSearchCV from the Python package sklearn. Some of the models that I test require feature scaling (e.g. Support Vector Regression - SVR). Recently, in the Udemy course Machine Learning A-Z™: Hands-On Python & R In Data Science, the instructors mentioned that for SVR, the target should also be scaled (if it is not binary). Bearing this in mind, I wonder whether the target is also scaled in each iteration of the cross-validation procedure performed by GridSearchCV or if only the features are scaled. Please see the code below, which illustrates the normal procedure that I use for hyperparameter tuning for estimators that require scaling for the training sets:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR
    
def SVRegressor(**kwargs):
   '''contruct a pipeline to perform SVR regression'''
   return make_pipeline(StandardScaler(), SVR(**kwargs))

params = {'svr__kernel': ["poly", "rbf"]}
grid_search = GridSearchCV(SVRegressor(), params)
grid_search.fit(X, y)

I know that I could simply scale X and y a priori and drop the StandardScaler from the pipeline. However, I want to implement this approach in a code pipeline where multiple models are tested, in which, some require scaling and others do not. That is why I want to know how GridSearchCV handles scaling under the hood.


Solution

  • No it doesn't scale the target, if you look at make_pipeline, it simply passes the X and y argument into your transformer, and StandardScaler() does nothing to your y:

    def _fit_transform_one(transformer,
                           X,
                           y,
                           weight,
                           message_clsname='',
                           message=None,
                           **fit_params):
        """
        Fits ``transformer`` to ``X`` and ``y``. The transformed result is returned
        with the fitted transformer. If ``weight`` is not ``None``, the result will
        be multiplied by ``weight``.
        """
        with _print_elapsed_time(message_clsname, message):
            if hasattr(transformer, 'fit_transform'):
                res = transformer.fit_transform(X, y, **fit_params)
            else:
                res = transformer.fit(X, y, **fit_params).transform(X)
    
        if weight is None:
            return res, transformer
        return res * weight, transformer
    

    You can try this on StandardScaler() and you can see it does not do anything with y:

    np.random.seed(111)
    X = np.random.normal(5,2,(100,3))
    y = np.random.normal(5,2,100)
    
    res = StandardScaler().fit_transform(X=X,y=y)
    res.shape
    (100, 3)
    
    res.mean(axis=0)
    array([1.01030295e-15, 4.39648318e-16, 8.91509089e-16])
    
    res.std(axis=0)
    array([1., 1., 1.])
    

    You can also check the result of your gridsearchcv:

    SVRegressor = make_pipeline(StandardScaler(), SVR())
    params = {'svr__kernel': ["poly", "rbf"]}
    grid_search = GridSearchCV(SVRegressor, params,
    scoring='neg_mean_absolute_error')
    

    On unscaled y, you will see that on the unscaled data, your negative mean absolute error is around the same scale as your standard deviation (I used 2 in my example):

    grid_search.fit(X, y)
    
    grid_search.cv_results_['mean_test_score']
    array([-2.01029707, -1.88779205])
    

    On scaled y, our standard deviation would be 1, and you can see the error is around -1,:

    y_scaled = StandardScaler().fit_transform(y.reshape(-1,1)).ravel()
    grid_search.fit(X, y_scaled)
    
    grid_search.cv_results_['mean_test_score']
    array([-1.00585999, -0.88330208])