I'm getting started using the transformers library from HuggingFace.co. When I run the below file, it works, but I get an error indicating I'm not following best practices.
from transformers import pipeline
print(pipeline('sentiment-analysis')('I love you'))
When I run that file with python main.py
I get this error:
2.014 No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
2.014 Using a pipeline without specifying a model name and revision in production is not recommended.
But when I google that error message, this question doesn't show up with the best practice for specifying a model and revision. Additionally, assuming it's not too involved, I'd also like to see how the model can be cached, say, when building a Dockerfile in an elegant way. Is there perhaps a way of having models download by stuffing them into the requirements.txt file.
The message you see there is not an error message, it is just a warning. It tells you, that the pipeline is using distilbert-base-uncased-finetuned-sst-2-english because you haven't specified a model_id. In other words, it might not yield the best results for your use case.
There are plenty of models available on the hub you should play around with a few to find the one that yields the best results. You can use them by specifying the model parameter of the __init__
method:
from transformers import pipeline
model_id = "cardiffnlp/twitter-roberta-base-sentiment-latest"
sentiment_pipe = pipeline("sentiment-analysis", model=model_id)
print(sentiment_pipe('I love you'))
Regarding docker:
Just run a .py-script that loads the pipeline once as a step of your dockerfile. It will automatically cache the files. You can also save it directly to a specific location with save_pretrained:
# run this either before building the container an cp the `my_pipe` directory to the container later
sentiment_pipe.save_pretrained("my_pipe")
# load it inside the container
sentiment_pipe = pipeline("sentiment-analysis", model="my_pipe")
print(sentiment_pipe('I love you'))