I like to tag parts of speech using the BERT model. I used the Hugging face library for this purpose.
When I run the model on Hugging face API I got the output
However, when I run the code on Google Colab I got errors.
My code
from transformers import AutoModelWithHeads
from transformers import pipeline
from transformers import AutoTokenizer
model = AutoModelWithHeads.from_pretrained("bert-base-uncased")
adapter_name = model.load_adapter("AdapterHub/bert-base-uncased-pf-ud_pos", source="hf")
model.active_adapters = adapter_name
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
token_classification = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="NONE")
res = token_classification("Take out the trash bag from the bin and replace it.")
print(res)
The error is
The model 'BertModelWithHeads' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'EsmForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'YosoForTokenClassification', 'XLMRobertaAdapterModel', 'RobertaAdapterModel', 'AlbertAdapterModel', 'BeitAdapterModel', 'BertAdapterModel', 'BertGenerationAdapterModel', 'DistilBertAdapterModel', 'DebertaV2AdapterModel', 'DebertaAdapterModel', 'BartAdapterModel', 'MBartAdapterModel', 'GPT2AdapterModel', 'GPTJAdapterModel', 'T5AdapterModel', 'ViTAdapterModel'].
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-18-79b43720402e> in <cell line: 12>()
10 tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
11 token_classification = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="NONE")
---> 12 res = token_classification("Take out the trash bag from the bin and replace it.")
13 print(res)
4 frames
/usr/local/lib/python3.10/dist-packages/transformers/pipelines/token_classification.py in aggregate(self, pre_entities, aggregation_strategy)
346 score = pre_entity["scores"][entity_idx]
347 entity = {
--> 348 "entity": self.model.config.id2label[entity_idx],
349 "score": score,
350 "index": pre_entity["index"],
KeyError: 16
I don't understand if the model ran ok in the Hugging Face API then why it was unable to run on Google Colab?
Thank you in advance.
Here is how you can do it:
from transformers import AutoModelWithHeads, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelWithHeads.from_pretrained("bert-base-uncased")
model.load_adapter(
"AdapterHub/bert-base-uncased-pf-ud_pos",
source="hf",
set_active=True,
)
token_classification = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
)
This pipeline creation part is going to show a warning, i.e.
The model 'BertModelWithHeads' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'EsmForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'YosoForTokenClassification', 'XLMRobertaAdapterModel', 'RobertaAdapterModel', 'AlbertAdapterModel', 'BeitAdapterModel', 'BertAdapterModel', 'BertGenerationAdapterModel', 'DistilBertAdapterModel', 'DebertaV2AdapterModel', 'DebertaAdapterModel', 'BartAdapterModel', 'MBartAdapterModel', 'GPT2AdapterModel', 'GPTJAdapterModel', 'T5AdapterModel', 'ViTAdapterModel'].
You can just ignore it, the pipeline will still work. Here is an example:
>>> token_classification("Take out the trash bag from the bin and replace it")
[{'entity': 'VERB','score': 0.99986637, 'index': 1, 'word': 'take', 'start': 0, 'end': 4},
{'entity': 'ADP', 'score': 0.9829973, 'index': 2, 'word': 'out', 'start': 5, 'end': 8},
{'entity': 'DET', 'score': 0.9998791, 'index': 3, 'word': 'the', 'start': 9, 'end': 12},
{'entity': 'NOUN', 'score': 0.9958676, 'index': 4, 'word': 'trash', 'start': 13, 'end': 18},
{'entity': 'NOUN', 'score': 0.99657273, 'index': 5, 'word': 'bag', 'start': 19, 'end': 22},
{'entity': 'ADP', 'score': 0.99989176, 'index': 6, 'word': 'from', 'start': 23, 'end': 27},
{'entity': 'DET', 'score': 0.99982834, 'index': 7, 'word': 'the', 'start': 28, 'end': 31},
{'entity': 'NOUN', 'score': 0.99584526, 'index': 8, 'word': 'bin', 'start': 32, 'end': 35},
{'entity': 'CCONJ', 'score': 0.99962616, 'index': 9, 'word': 'and', 'start': 36, 'end': 39},
{'entity': 'VERB', 'score': 0.99976975, 'index': 10, 'word': 'replace', 'start': 40, 'end': 47},
{'entity': 'PRON', 'score': 0.9989698, 'index': 11, 'word': 'it', 'start': 48, 'end': 50}]