Search code examples
machine-learningpytorchhuggingface-transformershuggingface

How to load a smaller GPT2 model on HuggingFace?


I know I can load the smallest GPT2 variant using

from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig

config = AutoConfig.from_pretrained(
    "gpt2",
    vocab_size=len(tokenizer),
    n_ctx=context_length,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)
model = GPT2LMHeadModel(config)
model_size = sum(t.numel() for t in model.parameters())
print(f"GPT-2 size: {model_size/1000**2:.1f}M parameters")
>>> GPT-2 size: 124.2M parameters

But how can I load a GPT2 architecture with a smaller number of decoder layers? Say, 3 or 5 instead of the original (I think it's 12)? Note that I'm training this from scratch so I'm not looking for an already pretrained model.


Solution

  • In order to stack 3 or 5 decoder layers rather than the default number of layers gpt2 has (12) it is sufficient to pass either n_layer=3 or n_layer=5 as an additional parameter to .from_pretrained() method of the AutoConfig class (GPT2Config under the hood).

    config = AutoConfig.from_pretrained(
        "gpt2",
        vocab_size=len(tokenizer),
        n_ctx=context_length,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        n_layer=3
    )
    

    Alternatively, you can also pass num_hidden_layers=3 or num_hidden_layers=5. Indeed, due to https://github.com/huggingface/transformers/pull/13026, the two are interchangeable.

    config = AutoConfig.from_pretrained(
        "gpt2",
        vocab_size=len(tokenizer),
        n_ctx=context_length,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        num_hidden_layers=3
    )