Search code examples
huggingface-transformers

How can I specify which GPU to use when using Huggingface Trainer


HuggingFace offers training_args like below. When I use HF trainer to train my model, I found cuda:0 is used by default.

I went through the HuggingFace Docs, but still don't know how to specify which GPU to run on when using HF trainer.

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

Solution

  • The most common and practical way to control which GPU to use is to set the CUDA_VISIBLE_DEVICES environment variable.

    If you want to use this option in the command line when running a python script, you can do it like this:

    CUDA_VISIBLE_DEVICES=1 python train.py
    

    Alternatively, you can insert this code before the import of PyTorch or any other CUDA-based library (like HuggingFace Transformers):

    import os
    os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # or "0,1" for multiple GPUs
    

    This way, regardless of how many GPUs you have on your machine, the Hugging Face Trainer will only be able to see and use the GPU(s) that you have specified.