python huggingface-transformers text-classification large-language-model zeroshot-classification

How does Huggingface's zero-shot classification work in production/webapp, do I need to train the model first?

I have already used huggingface's zero-shot classification: I used "facebook/bart-large-mnli" model as reported here (https://huggingface.co/tasks/zero-shot-classification). The accuracy is quite good for my task.

My question is about productionizing the code: In particular I would like to create a Gradio (or streamlit) webapp. Do I need to train the "facebook/bart-large-mnli" model first, secondly save the model in a pickle file, and then predict a new (unseen) sentence using the pickle file?
Or can I simply import the "facebook/bart-large-mnli" library and compute the prediction for the production/webapp code?

The latter scenario would be preferable. But I am not sure whether loading the model from scratch would produce the same output as loadingthe pickle file with the saved facebook/bart-large-mnli" model.

Thank you in advance.

Solution

Q: How does zero-shot classification work? Do I need train/tune the model to use in production?

Options:

(i) train the "facebook/bart-large-mnli" model first, secondly save the model in a pickle file, and then predict a new (unseen) sentence using the pickle file? or
(ii) can I simply import the "facebook/bart-large-mnli" library and compute the prediction for the production/webapp code?

A (human): (ii) You can load up the model with pipeline("zero-shot-classification", model="facebook/bart-large-mnli") once when the server start, then reuse the pipeline without re-initializing it for each request.

When you use the model off-the-shelf, it'll be zero-shot but if you fine-tune a model with limited training data, people commonly refer to that as "few-shot"; take a look at https://github.com/huggingface/setfit for few-shot learning.

The proof is in the pudding, see if the model you pick fits the task you want. Also, there's more than one way to wield the shiny hammer =)

Disclaimer: Your Miles May Vary...

Zero shot classification

TL;DR: I don't want to train anything, I don't have labeled data, do something with some labels that I come up with.

from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

text = "Catan (Base Game) | Ages 10+ | for 3 to 4 Players | Average Playtime 60 Minutes | Made by Catan Studio | TRADE, BUILD AND SETTLE: Embark on a quest to settle the isle of Catan! Guide your settlers to victory by clever trading and cunning development. But beware! Someone might cut off your road or buy a monopoly. And you never know when the wily robber might steal some of your precious games!"

candidate_labels = ['Beauty & Wellness', 'Electronics', 'Toys & Games']

classifier(text, candidate_labels)

[out]:

{'sequence': 'Catan (Base Game) | Ages 10+ | for 3 to 4 Players | Average Playtime 60 Minutes | Made by Catan Studio | TRADE, BUILD AND SETTLE: Embark on a quest to settle the isle of Catan! Guide your settlers to victory by clever trading and cunning development. But beware! Someone might cut off your road or buy a monopoly. And you never know when the wily robber might steal some of your precious games!',
 'labels': ['Toys & Games', 'Electronics', 'Beauty & Wellness'],
 'scores': [0.511284351348877, 0.38416239619255066, 0.10455326735973358]}

Don't classify, translate (or seq2seq)

Inspiration: https://arxiv.org/abs/1812.05774

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "google/flan-t5-large"

tokenizer= AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Catan (Base Game) | Ages 10+ | for 3 to 4 Players | Average Playtime 60 Minutes | Made by Catan Studio | TRADE, BUILD AND SETTLE: Embark on a quest to settle the isle of Catan! Guide your settlers to victory by clever trading and cunning development. But beware! Someone might cut off your road or buy a monopoly. And you never know when the wily robber might steal some of your precious games!"


prompt=f"""Which category is this product?
QUERY:{text}
OPTIONS:
 - Beauty & Wellness
 - Electronics
 - Toys & Games
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

tokenizer.decode(model.generate(input_ids)[0], skip_special_tokens=True)

[out]:

Toys & Games

And for the fun of it =)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "google/flan-t5-large"

tokenizer= AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

prompt=f"""How does zero-shot classification work? 
QUERY: Do I need tune/modify the model to use in production?
OPTIONS:
 - (i) train the "facebook/bart-large-mnli" model first, secondly save the model in a pickle file, and then predict a new (unseen) sentence using the pickle file
 - (ii) can I simply import the "facebook/bart-large-mnli" library and compute the prediction for the production/webapp code
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

print(tokenizer.decode(model.generate(input_ids)[0], skip_special_tokens=True))

[out]:

(ii)

Q: What if both methods above don't work?

A: Try more models from https://huggingface.co/models or try different tasks and be creative in how to use what's available to fit your data to solve the problem

Q: What if none of the models/tasks works?

A: Then it's time to think about what data you can/need to collect to train the model you need. But before collecting the data, it'll be prudent to first decide how you want to evaluate/measure the success of the model, e.g. F1-score, accuracy, etc.

This is how I'll personally solve NLP problems that fits the frame "X problem, Y approach" solutions, https://hackernoon.com/what-kind-of-scientist-are-you (shameless plug)

Q: How do I deploy a model after I found the model+task I want?

There're several ways but it'll be out-of-scope of this question, since it's asking about how zero-shot works and more pertinently "Can I use zero-shot classification models off-the-shelf without training?".

To deploy a model, take a look at: