Search code examples
pytorchlanguage-modelhuggingface-transformers

Size of the training data of GPT2-XL pre-trained model


In huggingface transformer, it is possible to use the pre-trained GPT2-XL language model. But I don't find, on which dataset it is trained? Is it the same trained model which OpenAI used for their paper (trained on 40GB dataset called webtext) ?


Solution

  • The GPT2-XL model is the biggest of the four architectures detailed in the paper you linked (1542M parameters). It is trained on the same data as the other three, which is the WebText you're mentioning.