pythonpytorchruntime-errorgpu

How to fix PyTorch RuntimeError: CUDA error: out of memory?


I'm trying to train my Pytorch model on a remote server using a GPU. However, the training phase doesn't start, and I have the following error instead: RuntimeError: CUDA error: out of memory

I reinstalled Pytorch with Cuda 11 in case my version of Cuda is not compatible with the GPU I use (NVidia GeForce RTX 3080). It still doesn't work.

I also ran this command torch.cuda.empty_cache(). And it still doesn't work.

When I run the code below in my interpreter it still displays RuntimeError: CUDA error: out of memory

import torch
print(torch.rand(1, device="cuda"))

However, it works on cpu.

import torch
print(torch.rand(1, device="cpu"))

When I run the command nvidia-smi I have the following output:

enter image description here

How can I fix it?


Solution

  • The problem here is that the GPU that you are trying to use is already occupied by another process. The steps for checking this are:

    1. Use nvidia-smi in the terminal. This will check if your GPU drivers are installed and the load of the GPUS. If it fails, or doesn't show your gpu, check your driver installation.
    2. If the GPU shows >0% GPU Memory Usage, that means that it is already being used by another process. You can close it (Don't do that in a shared environment!) or launch it in the other GPU, if you have another one free.