huggingface-transformers transformer-model distilbert

Difference between from_config and from_pretrained in HuggingFace

num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
    
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

I am modifying this code (modified code is provided above) to test DistilBERT transformer layer depth size via from_config since from my knowledge from_pretrained uses 6 layers because in the paper section 3 they said:

we initialize the student from the teacher by taking one layer out of two

While what I want to test is various sizes of layers. To test whether both are the same, I tried running the from_config with n_layers=6 because based on the documentation DistilBertConfig the n_layers is used to determine the transformer block depth. However as I run model1 and model2 I found that with SST-2 dataset, in accuracy:

model1 achieved only 0.8073
model2 achieved 0.901

If they both behave the same I expect the result to be somewhat similar but 10% drop is a significant drop, therefore I believe there ha to be a difference between the functions. Is there a reason behind the difference of the approach (for example model1 has not yet applied hyperparameter search) and is there a way to make both functions behave the same? Thank you!

Solution

The two functions you described, from_config and from_pretrained, do not behave the same. For a model M, with a reference R:

from_config allows you to instantiate a blank model, which has the same configuration (the same shape) as your model of choice: M is as R was before training
from_pretrained allows you to load a pretrained model, which has already been trained on a specific dataset for a given number of epochs: M is as R after training.

To cite the doc, Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.