Search code examples
pythonhuggingface-transformersbert-language-model

Unable to load SpanBert model with transformers package


I have some questions regarding of SpanBert loading using transformers packages.

I downloaded the pre-trained file from SpanBert GitHub Repo and vocab.txt from Bert. Here is the code I used for loading:

model = BertModel.from_pretrained(config_file=config_file,
                                  pretrained_model_name_or_path=model_file,
                                  vocab_file=vocab_file)
model.to("cuda")

where

  • config_file -> config.json
  • model_file -> pytorch_model.bin
  • vocab_file -> vocab.txt

But I got the UnicodeDecoderError with the above code saying that 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

I also tried loading SpanBert with the method mentioned here. But it returned OSError: file SpanBERT/spanbert-base-cased not found.

Do you have any suggestions on loading the pre-trained model correctly? Any suggestions are much appreciated. Thanks!


Solution

    1. Download the pre-trained weights from the Github page.

    https://github.com/facebookresearch/SpanBERT

    SpanBERT (base & cased): 12-layer, 768-hidden, 12-heads , 110M parameters

    SpanBERT (large & cased): 24-layer, 1024-hidden, 16-heads, 340M parameters

    1. Extract them to a folder, for example I extracted to spanbert_hf_base folder which contains a .bin file and a config.json file.

    2. You can use AutoModel to load the model and simple bert tokenizer. From their repo:

    These models have the same format as the HuggingFace BERT models, so you can easily replace them with our SpanBET models.

    import torch
    from transformers import AutoModel
    model = AutoModel.from_pretrained('spanbert_hf_base/') # the path to .bin and config.json
    
    from transformers import BertTokenizer
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    
    b = torch.tensor(tokenizer.encode('hi this is me, mr. meeseeks', add_special_tokens=True, max_length = 512)).unsqueeze(0)
    
    out = model(b)
    

    Out:

    (tensor([[[-0.1204, -0.0806, -0.0168,  ..., -0.0599, -0.1932, -0.0967],
              [-0.0851, -0.0980,  0.0039,  ..., -0.0563, -0.1655, -0.0156],
              [-0.1111, -0.0318,  0.0141,  ..., -0.0518, -0.1068, -0.1271],
              [-0.0317, -0.0441, -0.0306,  ..., -0.1049, -0.1940, -0.1919],
              [-0.1200,  0.0277, -0.0372,  ..., -0.0930, -0.0627,  0.0143],
              [-0.1204, -0.0806, -0.0168,  ..., -0.0599, -0.1932, -0.0967]]],
            grad_fn=<NativeLayerNormBackward>),
     tensor([[-9.7530e-02,  1.6328e-01,  9.3202e-03,  1.1010e-01,  7.3047e-02,
              -1.7635e-01,  1.0046e-01, -1.4826e-02,  9.2583e-
             ............