What does config inside ``super().init(config)`` actually do?

I have the following code to create a custom model for Named-entity-recognition. Using ChatGPT and Copilot, I've commented it to understand its functionality.

However, the point with config inside super().__init__(config) is not clear for me. Which role does it play since we have already used XLMRobertaConfig at the beginning?

import torch.nn as nn
from transformers import XLMRobertaConfig
from transformers.modeling_outputs import TokenClassifierOutput
from transformers.models.roberta.modeling_roberta import RobertaModel
from transformers.models.roberta.modeling_roberta import RobertaPreTrainedModel

# Create a class for a custom model, which inherit from RobertaPreTrainedModel since we want to use the weights of a pretained model in the body of a custom model
class XLMRobertaForTokenClassification(RobertaPreTrainedModel):
    # Common practice in 🤗 Transformers 
    # allows the XLMRobertaForTokenClassification class to inherit the configuration functionality and attributes from the XLMRobertaConfig class
    config_class = XLMRobertaConfig

    # initialize the model
    def __init__(self, config):
        # call the initialization function of the parent class (RobertaPreTrainedModel)
        super().__init__(config)              # config is necessary when working with pretrained models to ensure the initialization with the correct configuration of parent class
        self.num_labels = config.num_labels   # number of classes to predict

        # Load model BODY
        self.roberta = RobertaModel(config, add_pooling_layer=False) # returns all hidden states not just [CLS]
        
        # Set up token CLASSIFICATION HEAD
        self.dropout = nn.Dropout(config.hidden_dropout_prob)             
        self.classifier = nn.Linear(config.hidden_size, config.num_labels) # linear transformation layer takes (batch_size, sequence_length, hidden_size) 
                                                                           # to produce output tensor of shape (batch_size, sequence_length, num_labels)
                                                                           # which can be interpreted as probability distribution over the labels for each token in the input sequence.
        
        # Load the pretrained weights for the model body and 
        # ... randomly initialize weights of token classification head
        self.init_weights()

    # define the forward pass
    def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, 
                labels=None, **kwargs):
        # Feed the data through model BODY to get encoder representations
        outputs = self.roberta(input_ids, attention_mask=attention_mask,
                               token_type_ids=token_type_ids, **kwargs)
        
        # Apply classifier to encoder representation 
        sequence_output = self.dropout(outputs[0]) # apply dropout to the first element of output tensor, i.e., last_hidden_state
        logits = self.classifier(sequence_output)  # apply the linear transformation to get the logits (i.e., raw output of the model)
        # Calculate losses if labels are provided
        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) # apply cross entropy function on flattend logits and flattend labels
        # Return model output object
        return TokenClassifierOutput(loss=loss, logits=logits, 
                                     hidden_states=outputs.hidden_states, 
                                     attentions=outputs.attentions)

EDIT: I quote directly from the book I'm working on: "config_class ensures that the standard XLMRobertaConfig settings are used when initilize a new model". If I understand it correctly, could we change these defualt parameters by overwriting the default settings in the config?

Solution

In your code, config_class doesn't contain any configuration parameters. It only contains XLMRobertaConfig, which is a class(/!\ not an instance of that class)

I'm not sure how RobertaPreTrainedModel model works, but it seems that, when you initialise an instance of XLMRobertaForTokenClassification, you need to give it the actual contents of the configuration (maybe as dictionary?)

But the class attribute config_class doesn't know anything about the values set in the configuration

Edit: Taken from the code of PreTrainedModel, which RobertaPreTrainedModel inherits from:

if not isinstance(config, PretrainedConfig):
   config_path = config if config is not None else pretrained_model_name_or_path
   config, model_kwargs = cls.config_class.from_pretrained(
       config_path,
       cache_dir=cache_dir,
       return_unused_kwargs=True,
       force_download=force_download,
       resume_download=resume_download,
       proxies=proxies,
       local_files_only=local_files_only,
       use_auth_token=use_auth_token,
       revision=revision,
       subfolder=subfolder,
       _from_auto=from_auto_class,
       _from_pipeline=from_pipeline,
       **kwargs,
   )

(Taken from here) the class_config is used as a fallback, if no valid config is given, to instantiate a config with default values

What does config inside ``super().__init__(config)`` actually do?

What does config inside ``super().init(config)`` actually do?