Search code examples
pythonpytorchbert-language-modelhuggingface-transformers

Correct Way to Fine-Tune/Train HuggingFace's Model from scratch (PyTorch)


For example, I want to train a BERT model from scratch but using the existing configuration. Is the following code the correct way to do so?

model = BertModel.from_pretrained('bert-base-cased')
model.init_weights()

Because I think the init_weights method will re-initialize all the weights.

Second question, if I want to change a bit the configuration, such as the number of hidden layers.

model = BertModel.from_pretrained('bert-base-cased', num_hidden_layers=10)
model.init_weights()

I wonder if the above is the correct way to do so. Because they don't appear to have an error when I run the above code.


Solution

  • In this way, you would unnecessarily download and load the pre-trained model weights. You can avoid that by downloading the BERT config

    config = transformers.AutoConfig.from_pretrained("bert-base-cased")
    model = transformers.AutoModel.from_config(config)
    

    Both yours and this solution assume you want to tokenize the input in the same as the original BERT and use the same vocabulary. If you want to use a different vocabulary, you can change in the config before instantiating the model:

    config.vocab_size = 123456
    

    Similarly, you can change any hyperparameter that you want to have different from the original BERT.