Search code examples
pythonmachine-learningscikit-learnclassificationpmml

moving classification to production environment


I am designing the architecture of an analytics system . I have a classification ensemble model developed in scikit learn . I want to move this to the production environment so the new incoming data can be classified on the fly using this model. Ideally the system should support a manual upload of "model" into production system. I don't have any experience with analytics production systems. Any suggestions would be very helpful

I have checked out Py2PMML But it does not support all the models. Primarily I am looking for Boosted tree regressions. PS: I am not asking for code or samples. Just the right direction.


Solution

  • At the moment there isn't an official way to export scikit models to PMML. The recommended way is to use pickle or joblib.dump. Please refer to the model persistence section of the docs. The idea is to save the model to disk with:

    >>> from sklearn.externals import joblib
    >>> joblib.dump(model, 'saved_model.pkl') 
    

    Then upload it to your server in production and load it with:

    >>> model = joblib.load('saved_model.pkl')
    

    It is important that you try to have similar environments, models saved in one version of scikit-learn might not load in another version.