I have the following code to create a custom model for Named-entity-recognition. Using ChatGPT and Copilot, I've commented it to understand its functionality.
However, the point with config
inside super().__init__(config)
is not clear for me. Which role does it play since we have already used XLMRobertaConfig
at the beginning?
import torch.nn as nn
from transformers import XLMRobertaConfig
from transformers.modeling_outputs import TokenClassifierOutput
from transformers.models.roberta.modeling_roberta import RobertaModel
from transformers.models.roberta.modeling_roberta import RobertaPreTrainedModel
# Create a class for a custom model, which inherit from RobertaPreTrainedModel since we want to use the weights of a pretained model in the body of a custom model
class XLMRobertaForTokenClassification(RobertaPreTrainedModel):
# Common practice in 🤗 Transformers
# allows the XLMRobertaForTokenClassification class to inherit the configuration functionality and attributes from the XLMRobertaConfig class
config_class = XLMRobertaConfig
# initialize the model
def __init__(self, config):
# call the initialization function of the parent class (RobertaPreTrainedModel)
super().__init__(config) # config is necessary when working with pretrained models to ensure the initialization with the correct configuration of parent class
self.num_labels = config.num_labels # number of classes to predict
# Load model BODY
self.roberta = RobertaModel(config, add_pooling_layer=False) # returns all hidden states not just [CLS]
# Set up token CLASSIFICATION HEAD
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.classifier = nn.Linear(config.hidden_size, config.num_labels) # linear transformation layer takes (batch_size, sequence_length, hidden_size)
# to produce output tensor of shape (batch_size, sequence_length, num_labels)
# which can be interpreted as probability distribution over the labels for each token in the input sequence.
# Load the pretrained weights for the model body and
# ... randomly initialize weights of token classification head
self.init_weights()
# define the forward pass
def forward(self, input_ids=None, attention_mask=None, token_type_ids=None,
labels=None, **kwargs):
# Feed the data through model BODY to get encoder representations
outputs = self.roberta(input_ids, attention_mask=attention_mask,
token_type_ids=token_type_ids, **kwargs)
# Apply classifier to encoder representation
sequence_output = self.dropout(outputs[0]) # apply dropout to the first element of output tensor, i.e., last_hidden_state
logits = self.classifier(sequence_output) # apply the linear transformation to get the logits (i.e., raw output of the model)
# Calculate losses if labels are provided
loss = None
if labels is not None:
loss_fct = nn.CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) # apply cross entropy function on flattend logits and flattend labels
# Return model output object
return TokenClassifierOutput(loss=loss, logits=logits,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions)
EDIT: I quote directly from the book I'm working on: "config_class
ensures that the standard XLMRobertaConfig
settings are used when initilize a new model". If I understand it correctly, could we change these defualt parameters by overwriting the default settings in the config
?
In your code, config_class
doesn't contain any configuration parameters. It only contains XLMRobertaConfig
, which is a class
(/!\ not an instance of that class)
I'm not sure how RobertaPreTrainedModel
model works, but it seems that, when you initialise an instance of XLMRobertaForTokenClassification
, you need to give it the actual contents of the configuration (maybe as dictionary?)
But the class attribute config_class
doesn't know anything about the values set in the configuration
Edit: Taken from the code of PreTrainedModel
, which RobertaPreTrainedModel
inherits from:
if not isinstance(config, PretrainedConfig): config_path = config if config is not None else pretrained_model_name_or_path config, model_kwargs = cls.config_class.from_pretrained( config_path, cache_dir=cache_dir, return_unused_kwargs=True, force_download=force_download, resume_download=resume_download, proxies=proxies, local_files_only=local_files_only, use_auth_token=use_auth_token, revision=revision, subfolder=subfolder, _from_auto=from_auto_class, _from_pipeline=from_pipeline, **kwargs, )
(Taken from here) the class_config is used as a fallback, if no valid config is given, to instantiate a config with default values