Why does this statement download the model? Why isn't it downloaded when I install the package with pip3 install keybert
? How can I pre-load it to the docker image so it wouldn't be downloaded every time?
from keybert import KeyBERT
kw_model = KeyBERT()
Right now my dockerfile does the following:
RUN pip install --user -r requirements.txt
requirements.txt:
google-cloud-pubsub==2.8.0
google-cloud-logging==2.6.0
requests==2.28.0
keybert==0.5.1
One potential solution is
from keybert import KeyBERT
kw_model = KeyBERT()
kw_model.model.embedding_model.save("keybert")
COPY
command in the Dockerfile# Copy local code to the container image.
COPY ./keybert/ ./keybert/
from keybert import KeyBERT
new_kw_model = KeyBERT("./keybert")
The reason for this behavior is that KeyBERT
uses other SBERT models, and you can use KeyBERT with more than one model: https://maartengr.github.io/KeyBERT/guides/embeddings.html
So you'd add a copy of whichever model best suits your purposes to the Docker image