I believe I’m correctly following HuggingFace’s documentation on fine-tuning pretrained models, but I get a model with 100% trainable parameters. I thought only some layers would be unfrozen and optimized, but it looks like all of them are.
def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
)
...
# id2label and label2id represent 3 classes in my current problem
model_name = "nvidia/segformer-b5-finetuned-cityscapes-1024-1024"
model = AutoModelForSemanticSegmentation.from_pretrained(model_name, id2label=id2label, label2id=label2id, ignore_mismatched_sizes=True)
print_trainable_parameters(model)
Prints the following:
Some weights of SegformerForSemanticSegmentation were not initialized from the model checkpoint at nvidia/segformer-b5-finetuned-cityscapes-1024-1024 and are newly initialized because the shapes did not match:
- decode_head.classifier.weight: found shape torch.Size([19, 768, 1, 1]) in the checkpoint and torch.Size([3, 768, 1, 1]) in the model instantiated
- decode_head.classifier.bias: found shape torch.Size([19]) in the checkpoint and torch.Size([3]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 84595651 || all params: 84595651 || trainable%: 100.00
Why 100% trainable parameters? I could use PEFT to reduce the number of trainable parameters, but I thought that only a small subset of the parameters would be free to be optimized based on the warning message of layer decode_head.classifier
.
This is the expected behavior. The library can't freeze the layers for you. You can freeze them yourself by setting requires_grad to False
for certain layers as shown below:
from transformers import AutoModelForSemanticSegmentation
model_name = "nvidia/segformer-b5-finetuned-cityscapes-1024-1024"
model = AutoModelForSemanticSegmentation.from_pretrained(model_name)
print_trainable_parameters(model)
# freezing everything except the decoder head
for name, param in model.named_parameters():
if not name.startswith("decode_head"):
param.requires_grad = False
print_trainable_parameters(model)
Output:
trainable params: 84607955 || all params: 84607955 || trainable%: 100.00
trainable params: 3164947 || all params: 84607955 || trainable%: 3.74