I'm trying to train my Pytorch model on a remote server using a GPU. However, the training phase doesn't start, and I have the following error instead: RuntimeError: CUDA error: out of memory
I reinstalled Pytorch with Cuda 11 in case my version of Cuda is not compatible with the GPU I use (NVidia GeForce RTX 3080). It still doesn't work.
I also ran this command torch.cuda.empty_cache()
. And it still doesn't work.
When I run the code below in my interpreter it still displays RuntimeError: CUDA error: out of memory
import torch
print(torch.rand(1, device="cuda"))
However, it works on cpu.
import torch
print(torch.rand(1, device="cpu"))
When I run the command nvidia-smi
I have the following output:
How can I fix it?
The problem here is that the GPU that you are trying to use is already occupied by another process. The steps for checking this are:
nvidia-smi
in the terminal. This will check if your GPU
drivers are installed and the load of the GPUS. If it fails, or doesn't show your gpu, check your driver installation.