Search code examples

Packaging Libraries with ML models in Python

I have a saved model for Sentiment Analysis and code and data along with it. I am trying to create a library that will have functionalities from this code and uses this trained model. I do not get how will I incorporate the model and functionalities dependent upon it.

Can anyone guide me on how to do that specifically?

Edit: Using pickle is the method I went with (answered below)


  • You need to know about three things if you want to maintain such a library properly:

    • how to build a package
    • how to version a package
    • how to distribute a package

    There is a few ways how you could do that, the most user-friendly at the moment is probably poetry, so I'll use that as an example. It needs to be installed if you want to use this post as a tutorial.

    In order to have some very basic project skeleton to work with, I'll just assume that you have something similar to this:

    │   ├───model.pkl
    │   ├───
    │   ├───
    │   ├───
    │   └───
    • model.pkl: the model artifact that you're going to ship with your package
    • empty, needs to be there to make this folder a python module
    • contains the class definition and features that define your model
    • accepts data to train you model and overwrite the current model.pkl file with the result, something roughly like this:
    import pickle
    from pathlib import Path
    from modelpersister.model_definition import SentimentAnalyzer
    # overwrite the current model given some new data
    def train(data):
        model = SentimentAnalyzer.train(data)
        with open(Path(__file__).parent / "model.pkl") as model_file:
            pickle.dump(model, model_file)
    • accepts data points to analyze them given the current model.pkl, something roughly like this:
    import pickle
    import importlib.resources
    from modelpersister.model_definition import MyModel
    # load the current model as a package resource (small but important detail)
    with importlib.resources.path("modelpersister", "model.pkl") as model_file:
        model: MyModel = pickle.load(model_file)
    # make meaningful analyzes available in this file
    def estimate(data_point):
        return model.estimate(data_point)
    • pyproject.toml: a metadata file that poetry needs in order to package this code, something very similar to this:
    name = "modelpersister"
    version = "0.1.0"
    description = "Ship a sentiment analysis model."
    authors = ["Mishaal <>"]
    license = "MIT"  # a good default as far as licenses go
    python = "^3.8"
    sklearn = "^0.23"  # or whichever ML library you used for your model definition
    requires = ["poetry>=0.12"]
    build-backend = "poetry.masonry.api"

    Given all of these files being filled with meaningful code and hopefully using a better name than modelpersister for the project, your workflow would look roughly like this:

    • update your features in, train your model with on better data, or add new functions in until you feel like your model is now noticeably better than before
    • run poetry version minor to update the package version
    • run poetry build to build your code and model into a source distribution and wheel file that you can, if you want, perform some final tests on
    • run poetry publish to distribute your package - by default to the global Python package index, but you can also set up a private PyPI instance and tell poetry about it, or upload it manually somewhere else