I'm trying to train my Pytorch model on a remote server using a GPU. However, the training phase doesn't start, and I have the following error instead: RuntimeError: CUDA error: out of memory
I reinstalled Pytorch with Cuda 11 in case my version of Cuda is not compatible with the GPU I use (NVidia GeForce RTX 3080). It still doesn't work.
I also ran this command torch.cuda.empty_cache()
. And it still doesn't work.
When I run the code below in my interpreter it still displays RuntimeError: CUDA error: out of memory
import torch
print(torch.rand(1, device="cuda"))
However, it works on cpu.
import torch
print(torch.rand(1, device="cpu"))
When I run the command nvidia-smi
I have the following output:
How can I fix it?
The problem here is that the GPU that you are trying to use is already occupied by another process. The steps for checking this are:
in the terminal. This will check if your GPU
drivers are installed and the load of the GPUS. If it fails, or doesn't show your gpu, check your driver installation.