Search code examples
pythonpython-3.xnlphuggingface-transformers

HuggingFace Training using GPU


Based on HuggingFace script to train a transformers model from scratch. I run:

python3 run_mlm.py \
--dataset_name wikipedia \
--tokenizer_name roberta-base \
--model_type roberta \
--dataset_config_name 20200501.en \
--do_train \
--do_eval \
--learning_rate 1e-5 \
--num_train_epochs 5 \
--save_steps 5000 \
--warmup_steps=10000 \ 
--seed 666 \
--gradient_accumulation_steps=4 \ 
--output_dir models/mlm_wikipedia_scratch/ \
--per_gpu_train_batch_size 8

I don't understand why I can't see my python3 process on GPU running nvidia-smi Here a screen: top | nvidia-smi | training_script


Solution

  • You have to make sure the followings are correct:

    1. GPU is correctly installed on your environment
    In [1]: import torch
    In [2]: torch.cuda.is_available()
    Out[2]: True
    
    1. Specify the GPU you want to use:
    export CUDA_VISIBLE_DEVICES=X        # X = 0, 1 or 2
    echo $CUDA_VISIBLE_DEVICES           # Testing: Should display the GPU you set
    

    Run the script again and it should work.