pytorch language-model huggingface-transformers

Size of the training data of GPT2-XL pre-trained model

In huggingface transformer, it is possible to use the pre-trained GPT2-XL language model. But I don't find, on which dataset it is trained? Is it the same trained model which OpenAI used for their paper (trained on 40GB dataset called webtext) ?

Solution

The GPT2-XL model is the biggest of the four architectures detailed in the paper you linked (1542M parameters). It is trained on the same data as the other three, which is the WebText you're mentioning.

RuntimeError: Could not run 'aten::xxxx', Can't load Pytorch model trained with TPU
How to apply min-max scaling on a IterableDataset?
ModuleNotFoundError: No module named 'pytorch_lightning.core.decorators' | Google Colab GPU session
How does Hydra `_partial_` interact with seeding
Pytorch tensor to numpy array
What do BatchNorm2d's running_mean / running_var mean in PyTorch?
Map each element of torch.Tensor with it's value in the dict
Pytroch clamp for complex values
Can batch normalization be considered a linear transformation?
Doing PyWavelets calculation on GPU
What is the difference between an Embedding Layer with a bias immediately afterwards and a Linear Layer in PyTorch
How do I display a single image in PyTorch?
Examples or explanations of pytorch dataloaders?
How to map values from a 3D tensor to a 1D tensor in PyTorch?
Traceback (most recent call last) in Colab when looping through dataloader in pytorch
Is there a way to use list of indices to simultaneously access the modules of nn.ModuleList in python?
How to multiply 2x3x3x3 matrix by 2x3 matrix to get 2x3 matrix
How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?
HuggingFace Model - OnnxRuntime - Jupyter Notebook Print Model Summary
The “Forward/Backward Passage Size” is too large for the pytorch model (Yolov3)
How to solve the pytorch RuntimeError: Numpy is not available without upgrading numpy to the latest version because of other dependencies
Torch Euclidian Norm (L2)
Does order of transforms applied for data augmentation matter in Torchvision transforms?
Problem in Backpropagation through a sample in Beta distribution in pytorch
Reinforcement Learning Gymnasium ValueError
Difference between torch.as_tensor() and torch.asarray()
Turn Grayscale mask-image into RGB while changing one grayscale into specific color
Why KL divergence is negative in Pytorch?
Neural network learning to sum two numbers
Forward pass with all samples