I have a saved model for Sentiment Analysis and code and data along with it. I am trying to create a library that will have functionalities from this code and uses this trained model. I do not get how will I incorporate the model and functionalities dependent upon it.
Can anyone guide me on how to do that specifically?
Edit: Using pickle is the method I went with (answered below)
You need to know about three things if you want to maintain such a library properly:
There is a few ways how you could do that, the most user-friendly at the moment is probably poetry
, so I'll use that as an example. It needs to be installed if you want to use this post as a tutorial.
In order to have some very basic project skeleton to work with, I'll just assume that you have something similar to this:
modelpersister
├───modelpersister
│ ├───model.pkl
│ ├───__init__.py
│ ├───model_definition.py
│ ├───train.py
│ └───analyze.py
└───pyproject.toml
model.pkl
: the model artifact that you're going to ship with your package__init__.py
: empty, needs to be there to make this folder a python modulemodel_definition.py
: contains the class definition and features that define your modeltrain.py
: accepts data to train you model and overwrite the current model.pkl
file with the result, something roughly like this:import pickle
from pathlib import Path
from modelpersister.model_definition import SentimentAnalyzer
# overwrite the current model given some new data
def train(data):
model = SentimentAnalyzer.train(data)
with open(Path(__file__).parent / "model.pkl") as model_file:
pickle.dump(model, model_file)
analyze.py
: accepts data points to analyze them given the current model.pkl
, something roughly like this:import pickle
import importlib.resources
from modelpersister.model_definition import MyModel
# load the current model as a package resource (small but important detail)
with importlib.resources.path("modelpersister", "model.pkl") as model_file:
model: MyModel = pickle.load(model_file)
# make meaningful analyzes available in this file
def estimate(data_point):
return model.estimate(data_point)
pyproject.toml
: a metadata file that poetry needs in order to package this code, something very similar to this:[tool.poetry]
name = "modelpersister"
version = "0.1.0"
description = "Ship a sentiment analysis model."
authors = ["Mishaal <my@mail.com>"]
license = "MIT" # a good default as far as licenses go
[tool.poetry.dependencies]
python = "^3.8"
sklearn = "^0.23" # or whichever ML library you used for your model definition
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
Given all of these files being filled with meaningful code and hopefully using a better name than modelpersister
for the project, your workflow would look roughly like this:
model_definition.py
, train your model with train.py
on better data, or add new functions in analysis.py
until you feel like your model is now noticeably better than beforepoetry version minor
to update the package versionpoetry build
to build your code and model into a source distribution and wheel file that you can, if you want, perform some final tests onpoetry publish
to distribute your package - by default to the global Python package index, but you can also set up a private PyPI instance and tell poetry
about it, or upload it manually somewhere else