Search code examples
pytorchhuggingface-transformershuggingface-tokenizers

How do I prevent a lack of VRAM halfway through training a Huggingface Transformers (Pegasus) model?


I'm taking a pre-trained pegasus model through Huggingface transformers, (specifically, google/pegasus-cnn_dailymail, and I'm using Huggingface transformers through Pytorch) and I want to finetune it on my own data. This is however quite a large dataset and I've run into the problem of running out of VRAM halfway through training, which because of the size of the dataset can be a few days after training even started, which makes a trial-and-error approach very inefficient.

I'm wondering how I can make sure ahead of time that it doesn't run out of memory. I would think that the memory usage of the model is in some way proportional to the size of the input, so I've passed truncation=True, padding=True, max_length=1024 to my tokenizer, which if my understanding is correct should make all the outputs of the tokenizer of the same size per line. Considering that the batch size is also a constant, I would think that the amount of VRAM in use should be stable. So I should just be able to cut up the dataset into managable parts, just looking at the ram/vram use of the first run, and infer that it will run smoothly from start to finish.

However, the opposite seems to be true. I've been observing the amount of VRAM used at any time and it can vary wildly, from ~12GB at one time to suddenly requiring more than 24GB and crashing (because I don't have more than 24GB).

So, how do I make sure that the amount of vram in use will stay within reasonable bounds for the full duration of the training process, and avoid it crashing due to a lack of vram when I'm already days into the training process?


Solution

  • padding=True actually doesn't pad to max_length, but to the longest sample in the list you pass to the tokenizer. To pad to max_length you need to set padding='max_length'.