Search code examples
pythonbert-language-modelhuggingface-transformers

Huggingface SciBERT predict masked word not working


I am trying to use the pretrained SciBERT model (https://huggingface.co/allenai/scibert_scivocab_uncased) from Huggingface to predict masked words in scientific/biomedical text. This produces errors, and not sure how to move forward from this point.

Here is the code so far -

!pip install transformers

from transformers import pipeline, AutoTokenizer, AutoModel
  
tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")

model = AutoModel.from_pretrained("allenai/scibert_scivocab_uncased")

unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
unmasker("the patient is a 55 year old [MASK] admitted with pneumonia")

This works with BERT alone, but is not the specialized pre-trained model -

!pip install transformers

from transformers import pipeline

unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("the patient is a 55 year old [MASK] admitted with pneumonia")

The errors with SciBERT are -

/usr/local/lib/python3.7/dist-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs)
    494         kwargs["feature_extractor"] = feature_extractor
    495 
--> 496     return task_class(model=model, framework=framework, task=task, **kwargs)

/usr/local/lib/python3.7/dist-packages/transformers/pipelines/fill_mask.py in __init__(self, model, tokenizer, modelcard, framework, args_parser, device, top_k, task)
     73         )
     74 
---> 75         self.check_model_type(TF_MODEL_WITH_LM_HEAD_MAPPING if self.framework == "tf" else MODEL_FOR_MASKED_LM_MAPPING)
     76         self.top_k = top_k
     77 

/usr/local/lib/python3.7/dist-packages/transformers/pipelines/base.py in check_model_type(self, supported_models)
    652                 self.task,
    653                 self.model.base_model_prefix,
--> 654                 f"The model '{self.model.__class__.__name__}' is not supported for {self.task}. Supported models are {supported_models}",
    655             )
    656 

PipelineException: The model 'BertModel' is not supported for fill-mask. Supported models are ['BigBirdForMaskedLM', 'Wav2Vec2ForMaskedLM', 'ConvBertForMaskedLM', 'LayoutLMForMaskedLM', 'DistilBertForMaskedLM', 'AlbertForMaskedLM', 'BartForConditionalGeneration', 'MBartForConditionalGeneration', 'CamembertForMaskedLM', 'XLMRobertaForMaskedLM', 'LongformerForMaskedLM', 'RobertaForMaskedLM', 'SqueezeBertForMaskedLM', 'BertForMaskedLM', 'MegatronBertForMaskedLM', 'MobileBertForMaskedLM', 'FlaubertWithLMHeadModel', 'XLMWithLMHeadModel', 'ElectraForMaskedLM', 'ReformerForMaskedLM', 'FunnelForMaskedLM', 'MPNetForMaskedLM', 'TapasForMaskedLM', 'DebertaForMaskedLM', 'DebertaV2ForMaskedLM', 'IBertForMaskedLM']

Solution

  • As the error message tells you, you need to use AutoModelForMaskedLM:

    from transformers import pipeline, AutoTokenizer, AutoModelForMaskedLM
    tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
    model = AutoModelForMaskedLM.from_pretrained("allenai/scibert_scivocab_uncased")
    unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
    unmasker("the patient is a 55 year old [MASK] admitted with pneumonia")
    

    Output:

    [{'sequence': 'the patient is a 55 year old woman admitted with pneumonia',
      'score': 0.4025486707687378,
      'token': 10221,
      'token_str': 'woman'},
     {'sequence': 'the patient is a 55 year old man admitted with pneumonia',
      'score': 0.23970800638198853,
      'token': 508,
      'token_str': 'man'},
     {'sequence': 'the patient is a 55 year old female admitted with pneumonia',
      'score': 0.15444642305374146,
      'token': 3672,
      'token_str': 'female'},
     {'sequence': 'the patient is a 55 year old male admitted with pneumonia',
      'score': 0.1111455038189888,
      'token': 3398,
      'token_str': 'male'},
     {'sequence': 'the patient is a 55 year old boy admitted with pneumonia',
      'score': 0.015877680853009224,
      'token': 12481,
      'token_str': 'boy'}]