pytorch nvidia bert-language-model huggingface-transformers gcp-ai-platform-notebook

GCP AI Platform Notebook driver too old?

I am trying to run the following Hugging Face Transformers tutorial on GCP's AI Platform Notebook with 32 vCPUs, 208 GB RAM, and 2 NVIDIA Tesla T4s.

However, when I try to run the part

model = DistillBERTClass()

model.to(device)

I get the following Assertion Error:

AssertionError: The NVIDIA driver on your system is too old (found version 10010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

However, when I run !nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0    22W /  70W |     10MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   39C    P8    10W /  70W |     10MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |

The version on the NVIDIA driver is compatible with the latest PyTorch version, which I am using. Has anyone else ran into this error, and is there a way around it?

Solution

You can try a newer NVIDIA driver version, we support latest CUDA 11 driver version, and then install Pytorch on top of it:

gcloud beta notebooks instances create cuda11 \
--vm-image-project=deeplearning-platform-release \
--vm-image-family=common-cu110-notebooks-debian-9 \
--machine-type=n1-standard-1 \
--location=us-west1-a \
--format=json

Image family:

common-cu110-notebooks-debian-9
common-cu110-notebooks-debian-10