I have a text classification task at hand and I want to use RoBERTa pre-trained model from the transformers
python library.
As per the documentation of TFRobertaForSequenceClassification to train we have to use,
from transformers import RobertaTokenizer, TFRobertaForSequenceClassification
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')
model.compile('adam', loss='sparse_categorical_crossentropy')
model.fit(x, y)
So where should I specify the number of target labels for sequence classification?
You can use num_labels
parameter.
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base', num_labels = 5)
ref: https://huggingface.co/transformers/main_classes/configuration.html