Pytorch GPU Out of memory on example script

I tried running the example script from official huggingface transformers repository with installed Python 3.10.2, PyTorch 1.11.0 and CUDA 11.3 for Sber GPT-3 Large. Without any file modifications I ran this script with arguments:

--output_dir out --model_name_or_path sberbank-ai/rugpt3large_based_on_gpt2 --train_file dataset.txt --do_train --num_train_epochs 15 --overwrite_output_dir

and got

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB

also tried --block_size 32 and --per_device_train_batch_size 1 but unsuccessfully. My GPU is RTX 2060 6GB. Maybe it's a real lack of video memory? Can it be solved without buying a new GPU?

Solution

The GPT-3 Models have an extremely large number of parameters and are therefore very memory-heavy. Just to get an idea, if I understand Sber AIs documentation right the Large model was pre-trained on 128/16 V100 GPUs (which have 32GB each) for multiple days. Model-finetuning and inference is obviously going to be much easier on memory but even that will require some serious hardware, at least for the larger models.

You can try to use the Medium and Small model and see if that works for you. Also you can always try to run it in a cloud service like Google Colab, they also have a notebook that demonstrates this. Make sure to activate GPU usage in notebook settings of Google Colab. In the free version you get some decent GPU, if you are more serious about this you can get the pro version for better hardware in their cloud. Probably a lot cheaper than buying a GPU more powerful than an RTX 2060 with the current prices. Of course there are many cloud hardware services where you can run a large model training/fine-tuning, not only Google.