Search code examples
pythonnlpgoogle-colaboratoryhuggingface-transformersbert-language-model

Google Colab unable to Hugging Face model


I like to tag parts of speech using the BERT model. I used the Hugging face library for this purpose.

When I run the model on Hugging face API I got the output enter image description here

However, when I run the code on Google Colab I got errors.

My code

from transformers import AutoModelWithHeads
from transformers import pipeline
from transformers import AutoTokenizer

model = AutoModelWithHeads.from_pretrained("bert-base-uncased")
adapter_name = model.load_adapter("AdapterHub/bert-base-uncased-pf-ud_pos", source="hf")
model.active_adapters = adapter_name

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
token_classification = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="NONE")
res = token_classification("Take out the trash bag from the bin and replace it.")
print(res)

The error is

 The model 'BertModelWithHeads' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'EsmForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'YosoForTokenClassification', 'XLMRobertaAdapterModel', 'RobertaAdapterModel', 'AlbertAdapterModel', 'BeitAdapterModel', 'BertAdapterModel', 'BertGenerationAdapterModel', 'DistilBertAdapterModel', 'DebertaV2AdapterModel', 'DebertaAdapterModel', 'BartAdapterModel', 'MBartAdapterModel', 'GPT2AdapterModel', 'GPTJAdapterModel', 'T5AdapterModel', 'ViTAdapterModel'].
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-79b43720402e> in <cell line: 12>()
     10 tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
     11 token_classification = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="NONE")
---> 12 res = token_classification("Take out the trash bag from the bin and replace it.")
     13 print(res)

4 frames
/usr/local/lib/python3.10/dist-packages/transformers/pipelines/token_classification.py in aggregate(self, pre_entities, aggregation_strategy)
    346                 score = pre_entity["scores"][entity_idx]
    347                 entity = {
--> 348                     "entity": self.model.config.id2label[entity_idx],
    349                     "score": score,
    350                     "index": pre_entity["index"],

KeyError: 16

I don't understand if the model ran ok in the Hugging Face API then why it was unable to run on Google Colab?

Thank you in advance.


Solution

  • Here is how you can do it:

    from transformers import AutoModelWithHeads, AutoTokenizer, pipeline
    
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    model = AutoModelWithHeads.from_pretrained("bert-base-uncased")
    model.load_adapter(
        "AdapterHub/bert-base-uncased-pf-ud_pos",
        source="hf",
        set_active=True,
    )
    token_classification = pipeline(
        "token-classification",
        model=model,
        tokenizer=tokenizer,
    )
    

    This pipeline creation part is going to show a warning, i.e.

    The model 'BertModelWithHeads' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'EsmForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'YosoForTokenClassification', 'XLMRobertaAdapterModel', 'RobertaAdapterModel', 'AlbertAdapterModel', 'BeitAdapterModel', 'BertAdapterModel', 'BertGenerationAdapterModel', 'DistilBertAdapterModel', 'DebertaV2AdapterModel', 'DebertaAdapterModel', 'BartAdapterModel', 'MBartAdapterModel', 'GPT2AdapterModel', 'GPTJAdapterModel', 'T5AdapterModel', 'ViTAdapterModel'].
    

    You can just ignore it, the pipeline will still work. Here is an example:

    >>> token_classification("Take out the trash bag from the bin and replace it")
    [{'entity': 'VERB','score': 0.99986637, 'index': 1, 'word': 'take', 'start': 0, 'end': 4},
     {'entity': 'ADP', 'score': 0.9829973, 'index': 2, 'word': 'out', 'start': 5, 'end': 8},
     {'entity': 'DET', 'score': 0.9998791, 'index': 3, 'word': 'the', 'start': 9, 'end': 12},
     {'entity': 'NOUN', 'score': 0.9958676, 'index': 4, 'word': 'trash', 'start': 13, 'end': 18},
     {'entity': 'NOUN', 'score': 0.99657273, 'index': 5, 'word': 'bag', 'start': 19, 'end': 22},
     {'entity': 'ADP', 'score': 0.99989176, 'index': 6, 'word': 'from', 'start': 23, 'end': 27},
     {'entity': 'DET', 'score': 0.99982834, 'index': 7, 'word': 'the', 'start': 28, 'end': 31},
     {'entity': 'NOUN', 'score': 0.99584526, 'index': 8, 'word': 'bin', 'start': 32, 'end': 35},
     {'entity': 'CCONJ', 'score': 0.99962616, 'index': 9, 'word': 'and', 'start': 36, 'end': 39},
     {'entity': 'VERB', 'score': 0.99976975, 'index': 10, 'word': 'replace', 'start': 40, 'end': 47},
     {'entity': 'PRON', 'score': 0.9989698, 'index': 11, 'word': 'it', 'start': 48, 'end': 50}]