Search code examples
pythonpytorchhuggingface-transformers

Is there a way to use a pre-trained transformers model without the configuration file?


I would like to fine-tune a pre-trained transformers model on Question Answering. The model was pre-trained on large engineering & science related corpora.

I have been provided a "checkpoint.pt" file containing the weights of the model. They have also provided me with a "bert_config.json" file but I am not sure if this is the correct configuration file.

from transformers import AutoModel, AutoTokenizer, AutoConfig

MODEL_PATH = "./checkpoint.pt"
config = AutoConfig.from_pretrained("./bert_config.json")
model = AutoModel.from_pretrained(MODEL_PATH, config=config)

The reason I believe that bert_config.json doesn't match "./checkpoint.pt" file is that, when I load the model with the code above, I get the error that goes as below.

Some weights of the model checkpoint at ./aerobert/phase2_ckpt_4302592.pt were not used when initializing BertModel: ['files', 'optimizer', 'model', 'master params']

  • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertModel were not initialized from the model checkpoint at ./aerobert/phase2_ckpt_4302592.pt and are newly initialized: ['encoder.layer.2.attention.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.21.attention.self.value.bias', 'encoder.layer.11.attention.self.value.bias', ............

If I am correct in assuming that "bert_config.json" is not the correct one, is there a way to load this model correctly without the config.json file?

Is there a way to see the model architecture from the saved weights of checkpoint.pt file?


Solution

  • This is a warning message instead of a error.

    It means that the pretrained model is pretrained in some task (such as Question Answering, MLM etc), if your own fine tune task is the same as those pretrained task, then this IS NOT expected; unless this IS expected because some pooler of pretrained model will not be used in fine tune.

    But this message doesn't mean that the bert_config.json isn't the right one. You can test it on huggingface's official colab notebook

    enter image description here

    You can find more information in this issue.