I am using a fine-tuned Huggingface model (on my company data) with the TextClassificationPipeline to make class predictions. Now the labels that this Pipeline
predicts defaults to LABEL_0
, LABEL_1
and so on. Is there a way to supply the label mappings to the TextClassificationPipeline
object so that the output may reflect the same?
Env:
- tensorflow==2.3.1
- transformers==4.3.2
Sample Code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer
MODEL_DIR = "path\to\my\fine-tuned\model"
# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
pipeline = TextClassificationPipeline(model=model,
tokenizer=tokenizer,
framework='tf',
device=0)
result = pipeline("It was a good watch. But a little boring.")[0]
Output:
In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
The simplest way is to add such a mapping is to edit the config.json of the model to contain: id2label
field as below:
{
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"id2label": [
"negative",
"positive"
],
"attention_dropout": 0.1,
.
.
}
A in-code way to set this mapping is by adding the id2label
param in the from_pretrained
call as below:
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR, id2label={0: 'negative', 1: 'positive'})
Here is the Github Issue I raised for this to get added into the Documentation of transformers.XForSequenceClassification.