Search code examples
pythonnlppytorchbert-language-modeltransfer-learning

How to save parameters just related to classifier layer of pretrained bert model due to the memory concerns?


I fine tuned the pretrained model here by freezing all layers except the classifier layers. And I saved weight file with using pytorch as .bin format.

Now instead of loading the 400mb pre-trained model, is there a way to load the parameters of the just Classifier layer I retrained it? By the way, I know that I have to load the original pretrained model, I just don't want to load the entire fine tuned model. due to memory concerns.

I can access the last layer's parameters from state_dict as below, but how can I save them in a separate file to use them later for less memory usage?

model = PosTaggingModel(num_pos_tag=num_pos_tag)
state_dict = torch.load("model.bin")
print("state dictionary:",state_dict)
with torch.no_grad():
    model.out_pos_tag.weight.copy_(state_dict['out_pos_tag.weight'])
    model.out_pos_tag.bias.copy_(state_dict['out_pos_tag.bias'])

Here is the model class:

class PosTaggingModel(nn.Module):
    def __init__(self, num_pos_tag):
        super(PosTaggingModel, self).__init__()
        self.num_pos_tag = num_pos_tag
        self.model = AutoModel.from_pretrained("dbmdz/bert-base-turkish-cased")
        for name, param in self.model.named_parameters():
            if 'classifier' not in name: # classifier layer
                param.requires_grad = False
        self.bert_drop = nn.Dropout(0.3)
        self.out_pos_tag = nn.Linear(768, self.num_pos_tag)
        
    def forward(self, ids, mask, token_type_ids, target_pos_tag):
        o1, _ = self.model(ids, attention_mask = mask, token_type_ids = token_type_ids)
        
        bo_pos_tag = self.bert_drop(o1)
        pos_tag = self.out_pos_tag(bo_pos_tag)

        loss = loss_fn(pos_tag, target_pos_tag, mask, self.num_pos_tag)
        return pos_tag, loss

I don't know if this is possible but I'm just looking for a way to save and reuse the last layer's parameters, without the need for parameters of frozen layers. I couldn't find it in the documentation. Thanks in advance to those who will help.


Solution

  • You can do it like this

    import torch
    
    # creating a dummy model
    class Classifier(torch.nn.Module):
      def __init__(self):
        super(Classifier, self).__init__()
        self.first = torch.nn.Linear(10, 10)
        self.second = torch.nn.Linear(10, 20)
        self.last = torch.nn.Linear(20, 1)
      
      def forward(self, x):
        pass
    
    # Creating its object
    model = Classifier()
    
    # Extracting the layer to save
    to_save = model.last
    
    # Saving the state dict of that layer
    torch.save(to_save.state_dict(), './classifier.bin')
    
    # Recreating the object of that model
    model = Classifier()
    
    # Updating the saved layer of model
    model.last.load_state_dict(torch.load('./classifier.bin'))