Search code examples
nlppytorchhuggingface-transformersbert-language-model

Are the pre-trained layers of the Huggingface BERT models frozen?


I use the following classification model from Huggingface:

model = AutoModelForSequenceClassification.from_pretrained("dbmdz/bert-base-german-cased", num_labels=2).to(device)

As I understand, this adds a dense layer at the end of the pre-trained model which has 2 output nodes. But are all the pre-trained layers before that frozen? Or are they also updated when fine-tuning? I can't find information about that in the docs...

So do I still have to do something like this?:

for param in model.bert.parameters():
    param.requires_grad = False

Solution

  • They are not frozen. All parameters are trainable by default. You can also check that with:

    for name, param in model.named_parameters():
        print(name, param.requires_grad)
    

    Output:

    bert.embeddings.word_embeddings.weight True
    bert.embeddings.position_embeddings.weight True
    bert.embeddings.token_type_embeddings.weight True
    bert.embeddings.LayerNorm.weight True
    bert.embeddings.LayerNorm.bias True
    bert.encoder.layer.0.attention.self.query.weight True
    bert.encoder.layer.0.attention.self.query.bias True
    bert.encoder.layer.0.attention.self.key.weight True
    bert.encoder.layer.0.attention.self.key.bias True
    bert.encoder.layer.0.attention.self.value.weight True
    bert.encoder.layer.0.attention.self.value.bias True
    bert.encoder.layer.0.attention.output.dense.weight True
    bert.encoder.layer.0.attention.output.dense.bias True
    bert.encoder.layer.0.attention.output.LayerNorm.weight True
    bert.encoder.layer.0.attention.output.LayerNorm.bias True
    bert.encoder.layer.0.intermediate.dense.weight True
    bert.encoder.layer.0.intermediate.dense.bias True
    bert.encoder.layer.0.output.dense.weight True
    bert.encoder.layer.0.output.dense.bias True
    bert.encoder.layer.0.output.LayerNorm.weight True
    bert.encoder.layer.0.output.LayerNorm.bias True
    bert.encoder.layer.1.attention.self.query.weight True
    bert.encoder.layer.1.attention.self.query.bias True
    bert.encoder.layer.1.attention.self.key.weight True
    bert.encoder.layer.1.attention.self.key.bias True
    bert.encoder.layer.1.attention.self.value.weight True
    bert.encoder.layer.1.attention.self.value.bias True
    bert.encoder.layer.1.attention.output.dense.weight True
    bert.encoder.layer.1.attention.output.dense.bias True
    bert.encoder.layer.1.attention.output.LayerNorm.weight True
    bert.encoder.layer.1.attention.output.LayerNorm.bias True
    bert.encoder.layer.1.intermediate.dense.weight True
    bert.encoder.layer.1.intermediate.dense.bias True
    bert.encoder.layer.1.output.dense.weight True
    bert.encoder.layer.1.output.dense.bias True
    bert.encoder.layer.1.output.LayerNorm.weight True
    bert.encoder.layer.1.output.LayerNorm.bias True
    bert.encoder.layer.2.attention.self.query.weight True
    bert.encoder.layer.2.attention.self.query.bias True
    bert.encoder.layer.2.attention.self.key.weight True
    bert.encoder.layer.2.attention.self.key.bias True
    bert.encoder.layer.2.attention.self.value.weight True
    bert.encoder.layer.2.attention.self.value.bias True
    bert.encoder.layer.2.attention.output.dense.weight True
    bert.encoder.layer.2.attention.output.dense.bias True
    bert.encoder.layer.2.attention.output.LayerNorm.weight True
    bert.encoder.layer.2.attention.output.LayerNorm.bias True
    bert.encoder.layer.2.intermediate.dense.weight True
    bert.encoder.layer.2.intermediate.dense.bias True
    bert.encoder.layer.2.output.dense.weight True
    bert.encoder.layer.2.output.dense.bias True
    bert.encoder.layer.2.output.LayerNorm.weight True
    bert.encoder.layer.2.output.LayerNorm.bias True
    bert.encoder.layer.3.attention.self.query.weight True
    bert.encoder.layer.3.attention.self.query.bias True
    bert.encoder.layer.3.attention.self.key.weight True
    bert.encoder.layer.3.attention.self.key.bias True
    bert.encoder.layer.3.attention.self.value.weight True
    bert.encoder.layer.3.attention.self.value.bias True
    bert.encoder.layer.3.attention.output.dense.weight True
    bert.encoder.layer.3.attention.output.dense.bias True
    bert.encoder.layer.3.attention.output.LayerNorm.weight True
    bert.encoder.layer.3.attention.output.LayerNorm.bias True
    bert.encoder.layer.3.intermediate.dense.weight True
    bert.encoder.layer.3.intermediate.dense.bias True
    bert.encoder.layer.3.output.dense.weight True
    bert.encoder.layer.3.output.dense.bias True
    bert.encoder.layer.3.output.LayerNorm.weight True
    bert.encoder.layer.3.output.LayerNorm.bias True
    bert.encoder.layer.4.attention.self.query.weight True
    bert.encoder.layer.4.attention.self.query.bias True
    bert.encoder.layer.4.attention.self.key.weight True
    bert.encoder.layer.4.attention.self.key.bias True
    bert.encoder.layer.4.attention.self.value.weight True
    bert.encoder.layer.4.attention.self.value.bias True
    bert.encoder.layer.4.attention.output.dense.weight True
    bert.encoder.layer.4.attention.output.dense.bias True
    bert.encoder.layer.4.attention.output.LayerNorm.weight True
    bert.encoder.layer.4.attention.output.LayerNorm.bias True
    bert.encoder.layer.4.intermediate.dense.weight True
    bert.encoder.layer.4.intermediate.dense.bias True
    bert.encoder.layer.4.output.dense.weight True
    bert.encoder.layer.4.output.dense.bias True
    bert.encoder.layer.4.output.LayerNorm.weight True
    bert.encoder.layer.4.output.LayerNorm.bias True
    bert.encoder.layer.5.attention.self.query.weight True
    bert.encoder.layer.5.attention.self.query.bias True
    bert.encoder.layer.5.attention.self.key.weight True
    bert.encoder.layer.5.attention.self.key.bias True
    bert.encoder.layer.5.attention.self.value.weight True
    bert.encoder.layer.5.attention.self.value.bias True
    bert.encoder.layer.5.attention.output.dense.weight True
    bert.encoder.layer.5.attention.output.dense.bias True
    bert.encoder.layer.5.attention.output.LayerNorm.weight True
    bert.encoder.layer.5.attention.output.LayerNorm.bias True
    bert.encoder.layer.5.intermediate.dense.weight True
    bert.encoder.layer.5.intermediate.dense.bias True
    bert.encoder.layer.5.output.dense.weight True
    bert.encoder.layer.5.output.dense.bias True
    bert.encoder.layer.5.output.LayerNorm.weight True
    bert.encoder.layer.5.output.LayerNorm.bias True
    bert.encoder.layer.6.attention.self.query.weight True
    bert.encoder.layer.6.attention.self.query.bias True
    bert.encoder.layer.6.attention.self.key.weight True
    bert.encoder.layer.6.attention.self.key.bias True
    bert.encoder.layer.6.attention.self.value.weight True
    bert.encoder.layer.6.attention.self.value.bias True
    bert.encoder.layer.6.attention.output.dense.weight True
    bert.encoder.layer.6.attention.output.dense.bias True
    bert.encoder.layer.6.attention.output.LayerNorm.weight True
    bert.encoder.layer.6.attention.output.LayerNorm.bias True
    bert.encoder.layer.6.intermediate.dense.weight True
    bert.encoder.layer.6.intermediate.dense.bias True
    bert.encoder.layer.6.output.dense.weight True
    bert.encoder.layer.6.output.dense.bias True
    bert.encoder.layer.6.output.LayerNorm.weight True
    bert.encoder.layer.6.output.LayerNorm.bias True
    bert.encoder.layer.7.attention.self.query.weight True
    bert.encoder.layer.7.attention.self.query.bias True
    bert.encoder.layer.7.attention.self.key.weight True
    bert.encoder.layer.7.attention.self.key.bias True
    bert.encoder.layer.7.attention.self.value.weight True
    bert.encoder.layer.7.attention.self.value.bias True
    bert.encoder.layer.7.attention.output.dense.weight True
    bert.encoder.layer.7.attention.output.dense.bias True
    bert.encoder.layer.7.attention.output.LayerNorm.weight True
    bert.encoder.layer.7.attention.output.LayerNorm.bias True
    bert.encoder.layer.7.intermediate.dense.weight True
    bert.encoder.layer.7.intermediate.dense.bias True
    bert.encoder.layer.7.output.dense.weight True
    bert.encoder.layer.7.output.dense.bias True
    bert.encoder.layer.7.output.LayerNorm.weight True
    bert.encoder.layer.7.output.LayerNorm.bias True
    bert.encoder.layer.8.attention.self.query.weight True
    bert.encoder.layer.8.attention.self.query.bias True
    bert.encoder.layer.8.attention.self.key.weight True
    bert.encoder.layer.8.attention.self.key.bias True
    bert.encoder.layer.8.attention.self.value.weight True
    bert.encoder.layer.8.attention.self.value.bias True
    bert.encoder.layer.8.attention.output.dense.weight True
    bert.encoder.layer.8.attention.output.dense.bias True
    bert.encoder.layer.8.attention.output.LayerNorm.weight True
    bert.encoder.layer.8.attention.output.LayerNorm.bias True
    bert.encoder.layer.8.intermediate.dense.weight True
    bert.encoder.layer.8.intermediate.dense.bias True
    bert.encoder.layer.8.output.dense.weight True
    bert.encoder.layer.8.output.dense.bias True
    bert.encoder.layer.8.output.LayerNorm.weight True
    bert.encoder.layer.8.output.LayerNorm.bias True
    bert.encoder.layer.9.attention.self.query.weight True
    bert.encoder.layer.9.attention.self.query.bias True
    bert.encoder.layer.9.attention.self.key.weight True
    bert.encoder.layer.9.attention.self.key.bias True
    bert.encoder.layer.9.attention.self.value.weight True
    bert.encoder.layer.9.attention.self.value.bias True
    bert.encoder.layer.9.attention.output.dense.weight True
    bert.encoder.layer.9.attention.output.dense.bias True
    bert.encoder.layer.9.attention.output.LayerNorm.weight True
    bert.encoder.layer.9.attention.output.LayerNorm.bias True
    bert.encoder.layer.9.intermediate.dense.weight True
    bert.encoder.layer.9.intermediate.dense.bias True
    bert.encoder.layer.9.output.dense.weight True
    bert.encoder.layer.9.output.dense.bias True
    bert.encoder.layer.9.output.LayerNorm.weight True
    bert.encoder.layer.9.output.LayerNorm.bias True
    bert.encoder.layer.10.attention.self.query.weight True
    bert.encoder.layer.10.attention.self.query.bias True
    bert.encoder.layer.10.attention.self.key.weight True
    bert.encoder.layer.10.attention.self.key.bias True
    bert.encoder.layer.10.attention.self.value.weight True
    bert.encoder.layer.10.attention.self.value.bias True
    bert.encoder.layer.10.attention.output.dense.weight True
    bert.encoder.layer.10.attention.output.dense.bias True
    bert.encoder.layer.10.attention.output.LayerNorm.weight True
    bert.encoder.layer.10.attention.output.LayerNorm.bias True
    bert.encoder.layer.10.intermediate.dense.weight True
    bert.encoder.layer.10.intermediate.dense.bias True
    bert.encoder.layer.10.output.dense.weight True
    bert.encoder.layer.10.output.dense.bias True
    bert.encoder.layer.10.output.LayerNorm.weight True
    bert.encoder.layer.10.output.LayerNorm.bias True
    bert.encoder.layer.11.attention.self.query.weight True
    bert.encoder.layer.11.attention.self.query.bias True
    bert.encoder.layer.11.attention.self.key.weight True
    bert.encoder.layer.11.attention.self.key.bias True
    bert.encoder.layer.11.attention.self.value.weight True
    bert.encoder.layer.11.attention.self.value.bias True
    bert.encoder.layer.11.attention.output.dense.weight True
    bert.encoder.layer.11.attention.output.dense.bias True
    bert.encoder.layer.11.attention.output.LayerNorm.weight True
    bert.encoder.layer.11.attention.output.LayerNorm.bias True
    bert.encoder.layer.11.intermediate.dense.weight True
    bert.encoder.layer.11.intermediate.dense.bias True
    bert.encoder.layer.11.output.dense.weight True
    bert.encoder.layer.11.output.dense.bias True
    bert.encoder.layer.11.output.LayerNorm.weight True
    bert.encoder.layer.11.output.LayerNorm.bias True
    bert.pooler.dense.weight True
    bert.pooler.dense.bias True
    classifier.weight True
    classifier.bias True