Search code examples
pythongoogle-colaboratorytraining-datahuggingface-transformershuggingface-datasets

Is there a way to select a device when running a python script on Google Colab?


I am attempting to run run_language_modeling.py which is a python script from hugging face. However, when I try to run it, I've noticed I'm solely using my CPU instead of the GPU (even though the environment is set to use this. So I'm looking for a way to tell the script to use the GPU.

Here's what I have...

To verify that I am using a GPU: !nvidia-smi

This shows:

Fri Feb 25 11:55:13 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Then, I'm running the following script which calls the .py file:

!python ./run_language_modeling.py \
    --output_dir=output \
    --model_type=bert \
    --do_train \
    --train_data_file=train.txt \
    --do_eval \
    --eval_data_file=test.txt \
    --per_gpu_train_batch_size 8 \
    --per_gpu_eval_batch_size 4 \
    --num_train_epochs 20 \
    --output_dir ./ \
    --save_steps 1000 \
    --save_total_limit 2 \
    --mlm \
    --overwrite_output_dir \
    --block_size 128 \
    --line_by_line \
    --tokenizer_name bert-base-uncased 

This continues until the CPU usage goes up to 100%. I assume there might be something like --device but I haven't been able to found it. Some other posts I've seen online mention I can do:

import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
tf_device='/gpu:0'

To select the GPU I want, but it's not really doing anything that I can tell. I also tried doing:

%%shell

export CUDA_VISIBLE_DEVICES=0

Any suggestions?


Solution

  • import torch
    
    device = torch.device("cpu")
    
    if torch.cuda.is_available():
       print("Training on GPU")
       device = torch.device("cuda:0")