Train huggingface's GPT2 from scratch : assert n_state % config.n_head == 0 error

I am trying to use a GPT2 architecture for musical applications and consequently need to train it from scratch. After a bit of googling I found that the issue #1714 from huggingface's github already had "solved" the question. When I try the to run the propose solution :

from transformers import GPT2Config, GPT2Model

SIZEREDUCTION = 10 #the factor by which we reduce the size of the velocity argument.
VELSIZE = int(np.floor(127/SIZEREDUCTION)) + 1 
SEQLEN=40 #size of data sequences.

config = GPT2Config(vocab_size = VELSIZE, n_positions = SEQLEN, n_embd = EMBEDSIZE, n_layer = NUMLAYER, n_ctx = SEQLEN, n_head = NUMHEAD)  
model = GPT2Model(config)

I get the following error :

Traceback (most recent call last):

  File "<ipython-input-7-b043a7a2425f>", line 1, in <module>
    runfile('C:/Users/cnelias/Desktop/PHD/Swing project/code/script/', wdir='C:/Users/cnelias/Desktop/PHD/Swing project/code/script')

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\spyder_kernels\customize\", line 786, in runfile
    execfile(filename, namespace)

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\spyder_kernels\customize\", line 110, in execfile
    exec(compile(, filename, 'exec'), namespace)

  File "C:/Users/cnelias/Desktop/PHD/Swing project/code/script/", line 191, in <module>
    model = GPT2Model(config)

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\", line 355, in __init__
    self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\", line 355, in <listcomp>
    self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\", line 223, in __init__
    self.attn = Attention(nx, n_ctx, config, scale)

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\", line 109, in __init__
    assert n_state % config.n_head == 0

What does it mean and how can I solve it ?

Also more generally, is there a documentation on how to do a forward call with the GPT2 ? Can I define my own train() function or do I have to use the model's build-in function ? Am I forced to use a Dataset to do the training or can I feed it individual tensors ? I looked for it but couldn't find answer to these on the doc, but maybe I missed something.

PS : I already read the blogpost fron, but it omits too much informations and details to be usefull for my application.


  • I think the error message is pretty clear:

    assert n_state % config.n_head == 0

    Tracing it back through the code, we can see

    n_state = nx # in Attention: n_state=768

    which indicates that n_state represents the embedding dimension (which is generally 768 by default in BERT-like models). When we then look at the GPT-2 documentation, it seems the parameter specifying this is n_embd, which you are setting to 5. As the error indicates, the embedding dimension has to be evenly divisible through the number of attention heads, which were specified as 4. So, choosing a different embedding dimension as a multiple of 4 should solve the problem. Of course, you can also change the number of heads to begin with, but it seems that odd embedding dimensions are not supported.