Why ML models installed with pip need to download something else after installation?

Why does this statement download the model? Why isn't it downloaded when I install the package with pip3 install keybert? How can I pre-load it to the docker image so it wouldn't be downloaded every time?

from keybert import KeyBERT
kw_model = KeyBERT()

Right now my dockerfile does the following:

RUN pip install --user -r requirements.txt

requirements.txt:

google-cloud-pubsub==2.8.0
google-cloud-logging==2.6.0
requests==2.28.0
keybert==0.5.1

Solution

One potential solution is

Run this code on your local computer to save a copy of the model to a local directory. e.g. save to a directory "keybert"

from keybert import KeyBERT
kw_model = KeyBERT()
kw_model.model.embedding_model.save("keybert")

Add the local copy of the model to the Docker image using the COPY command in the Dockerfile

# Copy local code to the container image.
COPY ./keybert/ ./keybert/

In your script running in the Docker container, load the model from the directory

from keybert import KeyBERT
new_kw_model = KeyBERT("./keybert")

The reason for this behavior is that KeyBERT uses other SBERT models, and you can use KeyBERT with more than one model: https://maartengr.github.io/KeyBERT/guides/embeddings.html

So you'd add a copy of whichever model best suits your purposes to the Docker image