Search code examples
machine-learningtensorflowscikit-learnpreprocessorsklearn-pandas

Is there a way to save the preprocessing objects in scikit-learn?


I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future, when new data comes in I have to use the same preprocessing scales to transform the new data before putting it into the model. Currently, I have to load all of the old data, fit the preprocessor, then transform the new data with those preprocessors. Is there a way for me to save the preprocessing objects objects (like sklearn.preprocessing.StandardScaler) so that I can just load the old objects rather than have to remake them?


Solution

  • As mentioned by lejlot, you can use the library pickle to save the trained network as a file in your hard drive, then you just need to load it to start to make predictions.

    Here is an example on how to use pickle to save and load python objects:

    import pickle
    import numpy as np
    
    npTest_obj = np.asarray([[1,2,3],[6,5,4],[8,7,9]])
    
    strTest_obj = "pickle example XXXX"
    
    
    if __name__ == "__main__":
        # store object information
        pickle.dump(npTest_obj, open("npObject.p", "wb"))
        pickle.dump(strTest_obj, open("strObject.p", "wb"))
    
        # read information from file
        str_readObj = pickle.load(open("strObject.p","rb"))
        np_readObj = pickle.load(open("npObject.p","rb"))
        print(str_readObj)
        print(np_readObj)