I am trying to use a GPT2 architecture for musical applications and consequently need to train it from scratch. After a bit of googling I found that the issue #1714 from huggingface's github already had "solved" the question. When I try the to run the propose solution :
from transformers import GPT2Config, GPT2Model
NUMLAYER = 4
NUMHEAD = 4
SIZEREDUCTION = 10 #the factor by which we reduce the size of the velocity argument.
VELSIZE = int(np.floor(127/SIZEREDUCTION)) + 1
SEQLEN=40 #size of data sequences.
EMBEDSIZE = 5
config = GPT2Config(vocab_size = VELSIZE, n_positions = SEQLEN, n_embd = EMBEDSIZE, n_layer = NUMLAYER, n_ctx = SEQLEN, n_head = NUMHEAD)
model = GPT2Model(config)
I get the following error :
Traceback (most recent call last):
File "<ipython-input-7-b043a7a2425f>", line 1, in <module>
runfile('C:/Users/cnelias/Desktop/PHD/Swing project/code/script/GPT2.py', wdir='C:/Users/cnelias/Desktop/PHD/Swing project/code/script')
File "C:\Users\cnelias\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Users\cnelias\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/cnelias/Desktop/PHD/Swing project/code/script/GPT2.py", line 191, in <module>
model = GPT2Model(config)
File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 355, in __init__
self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 355, in <listcomp>
self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 223, in __init__
self.attn = Attention(nx, n_ctx, config, scale)
File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 109, in __init__
assert n_state % config.n_head == 0
What does it mean and how can I solve it ?
Also more generally, is there a documentation on how to do a forward call with the GPT2 ? Can I define my own train()
function or do I have to use the model's build-in function ? Am I forced to use a Dataset
to do the training or can I feed it individual tensors ?
I looked for it but couldn't find answer to these on the doc, but maybe I missed something.
PS : I already read the blogpost fron huggingface.co, but it omits too much informations and details to be usefull for my application.
I think the error message is pretty clear:
assert n_state % config.n_head == 0
Tracing it back through the code, we can see
n_state = nx # in Attention: n_state=768
which indicates that n_state
represents the embedding dimension (which is generally 768 by default in BERT-like models). When we then look at the GPT-2 documentation, it seems the parameter specifying this is n_embd
, which you are setting to 5
. As the error indicates, the embedding dimension has to be evenly divisible through the number of attention heads, which were specified as 4
. So, choosing a different embedding dimension as a multiple of 4
should solve the problem. Of course, you can also change the number of heads to begin with, but it seems that odd embedding dimensions are not supported.