python nlp google-colaboratory huggingface-transformers bert-language-model

Google Colab unable to Hugging Face model

I like to tag parts of speech using the BERT model. I used the Hugging face library for this purpose.

When I run the model on Hugging face API I got the output

However, when I run the code on Google Colab I got errors.

My code

from transformers import AutoModelWithHeads
from transformers import pipeline
from transformers import AutoTokenizer

model = AutoModelWithHeads.from_pretrained("bert-base-uncased")
adapter_name = model.load_adapter("AdapterHub/bert-base-uncased-pf-ud_pos", source="hf")
model.active_adapters = adapter_name

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
token_classification = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="NONE")
res = token_classification("Take out the trash bag from the bin and replace it.")
print(res)

The error is

 The model 'BertModelWithHeads' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'EsmForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'YosoForTokenClassification', 'XLMRobertaAdapterModel', 'RobertaAdapterModel', 'AlbertAdapterModel', 'BeitAdapterModel', 'BertAdapterModel', 'BertGenerationAdapterModel', 'DistilBertAdapterModel', 'DebertaV2AdapterModel', 'DebertaAdapterModel', 'BartAdapterModel', 'MBartAdapterModel', 'GPT2AdapterModel', 'GPTJAdapterModel', 'T5AdapterModel', 'ViTAdapterModel'].
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-79b43720402e> in <cell line: 12>()
     10 tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
     11 token_classification = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="NONE")
---> 12 res = token_classification("Take out the trash bag from the bin and replace it.")
     13 print(res)

4 frames
/usr/local/lib/python3.10/dist-packages/transformers/pipelines/token_classification.py in aggregate(self, pre_entities, aggregation_strategy)
    346                 score = pre_entity["scores"][entity_idx]
    347                 entity = {
--> 348                     "entity": self.model.config.id2label[entity_idx],
    349                     "score": score,
    350                     "index": pre_entity["index"],

KeyError: 16

I don't understand if the model ran ok in the Hugging Face API then why it was unable to run on Google Colab?

Thank you in advance.

Solution

Here is how you can do it:

from transformers import AutoModelWithHeads, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelWithHeads.from_pretrained("bert-base-uncased")
model.load_adapter(
    "AdapterHub/bert-base-uncased-pf-ud_pos",
    source="hf",
    set_active=True,
)
token_classification = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
)

This pipeline creation part is going to show a warning, i.e.

The model 'BertModelWithHeads' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'EsmForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'YosoForTokenClassification', 'XLMRobertaAdapterModel', 'RobertaAdapterModel', 'AlbertAdapterModel', 'BeitAdapterModel', 'BertAdapterModel', 'BertGenerationAdapterModel', 'DistilBertAdapterModel', 'DebertaV2AdapterModel', 'DebertaAdapterModel', 'BartAdapterModel', 'MBartAdapterModel', 'GPT2AdapterModel', 'GPTJAdapterModel', 'T5AdapterModel', 'ViTAdapterModel'].

You can just ignore it, the pipeline will still work. Here is an example:

>>> token_classification("Take out the trash bag from the bin and replace it")
[{'entity': 'VERB','score': 0.99986637, 'index': 1, 'word': 'take', 'start': 0, 'end': 4},
 {'entity': 'ADP', 'score': 0.9829973, 'index': 2, 'word': 'out', 'start': 5, 'end': 8},
 {'entity': 'DET', 'score': 0.9998791, 'index': 3, 'word': 'the', 'start': 9, 'end': 12},
 {'entity': 'NOUN', 'score': 0.9958676, 'index': 4, 'word': 'trash', 'start': 13, 'end': 18},
 {'entity': 'NOUN', 'score': 0.99657273, 'index': 5, 'word': 'bag', 'start': 19, 'end': 22},
 {'entity': 'ADP', 'score': 0.99989176, 'index': 6, 'word': 'from', 'start': 23, 'end': 27},
 {'entity': 'DET', 'score': 0.99982834, 'index': 7, 'word': 'the', 'start': 28, 'end': 31},
 {'entity': 'NOUN', 'score': 0.99584526, 'index': 8, 'word': 'bin', 'start': 32, 'end': 35},
 {'entity': 'CCONJ', 'score': 0.99962616, 'index': 9, 'word': 'and', 'start': 36, 'end': 39},
 {'entity': 'VERB', 'score': 0.99976975, 'index': 10, 'word': 'replace', 'start': 40, 'end': 47},
 {'entity': 'PRON', 'score': 0.9989698, 'index': 11, 'word': 'it', 'start': 48, 'end': 50}]