I am attempting to run run_language_modeling.py which is a python script from hugging face. However, when I try to run it, I've noticed I'm solely using my CPU instead of the GPU (even though the environment is set to use this. So I'm looking for a way to tell the script to use the GPU.
Here's what I have...
To verify that I am using a GPU: !nvidia-smi
This shows:
Fri Feb 25 11:55:13 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 32C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Then, I'm running the following script which calls the .py
file:
!python ./run_language_modeling.py \
--output_dir=output \
--model_type=bert \
--do_train \
--train_data_file=train.txt \
--do_eval \
--eval_data_file=test.txt \
--per_gpu_train_batch_size 8 \
--per_gpu_eval_batch_size 4 \
--num_train_epochs 20 \
--output_dir ./ \
--save_steps 1000 \
--save_total_limit 2 \
--mlm \
--overwrite_output_dir \
--block_size 128 \
--line_by_line \
--tokenizer_name bert-base-uncased
This continues until the CPU usage goes up to 100%. I assume there might be something like --device
but I haven't been able to found it. Some other posts I've seen online mention I can do:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
tf_device='/gpu:0'
To select the GPU I want, but it's not really doing anything that I can tell. I also tried doing:
%%shell
export CUDA_VISIBLE_DEVICES=0
Any suggestions?
import torch
device = torch.device("cpu")
if torch.cuda.is_available():
print("Training on GPU")
device = torch.device("cuda:0")