Search code examples
pythonscikit-learnnormalizationstandardized

How to store scaling parameters for later use


I want to apply the scaling sklearn.preprocessing.scale module that scikit-learn offers for centering a dataset that I will use to train an svm classifier.

How can I then store the standardization parameters so that I can also apply them to the data that I want to classify?

I know I can use the standarScaler but can I somehow serialize it to a file so that I wont have to fit it to my data every time I want to run the classifier?


Solution

  • I think that the best way is to pickle it post fit, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this.

    Having said that, you can query sklearn.preprocessing.StandardScaler for the fit parameters:

    scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. New in version 0.17: scale_ is recommended instead of deprecated std_. mean_ : array of floats with shape [n_features] The mean value for each feature in the training set.

    The following short snippet illustrates this:

    from sklearn import preprocessing
    import numpy as np
    
    s = preprocessing.StandardScaler()
    s.fit(np.array([[1., 2, 3, 4]]).T)
    >>> s.mean_, s.scale_
    (array([ 2.5]), array([ 1.11803399]))