sentence = 'American Airlines was the first airline to fly every A380 flight perfectly when President George Bush was in Office. The Woodlands Texas is a great place to be.'
ner = pipeline('text-classification', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)
ners = ner(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print('\n')
for n in ners:
print(f"{n['word']} -> {n['entity_group']}")
I am inside google colab.
I tried
!pip install transformers --upgrade
# The error is caused by a bug in the transformers library. The fix is to install the latest version of the library.
but I received the following:
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py in _encode_plus(self, text, text_pair, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
574 ) -> BatchEncoding:
575 batched_input = [(text, text_pair)] if text_pair else [text]
--> 576 batched_output = self._batch_encode_plus(
577 batched_input,
578 is_split_into_words=is_split_into_words,
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'grouped_entities'
There may be a confusion , the Named Entity Recognition task is a token-classification
task, not a text-classification
task. Please update your code:
ner = pipeline(
'token-classification',
model='dbmdz/bert-large-cased-finetuned-conll03-english',
grouped_entities=True
) # alias "ner" available
That will raise a warning :
UserWarning: `grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="simple"` instead.
Updated code with aggregation_strategy
:
# Updated code with 'aggregation_strategy'
ner = pipeline(
'ner',
model='dbmdz/bert-large-cased-finetuned-conll03-english',
aggregation_strategy='simple'
)