I have spent considerable time trying to debug some pytorch code which I have created a minimal example of for the purpose of helping to better understand what the issue might be.
I have removed all necessary portions of the code which are unrelated to the issue so the remaining piece of code won't make much sense from a functional standpoint but it still displays the error I'm facing.
The overall task I'm working on is in a loop and every pass of the loop is computing the embedding of the image and adding it to a variable storing it. It's effectively aggregating it (not concatenating, so the size remains the same). I don't expect the number of iterations to force the datatype to overflow, I don't see this happening here nor in my code.
RuntimeError: CUDA out of memory.
.My environment is as follows:
- python 3.6.2
- Pytorch 1.4.0
- Cudatoolkit 10.0
- Driver version 410.78
- GPU: Nvidia GeForce GT 1030 (2GB VRAM)
(though I've replicated this experiment with the same result on a Titan RTX with 24GB,
same pytorch version and cuda toolkit and driver, it only goes out of memory further in the loop).
Complete code below. I have marked 2 lines as culprits, as deleting them removes the issue, though obviously I need to find a way to execute them without having memory issues. Any help would be much appreciated! You may try with any image named "source_image.bmp" to replicate the issue.
import torch
from PIL import Image
import torchvision
from torchvision import transforms
from pynvml import nvmlDeviceGetHandleByIndex, nvmlDeviceGetMemoryInfo, nvmlInit
import sys
import os
os.environ["CUDA_VISIBLE_DEVICES"]='0' # this is necessary on my system to allow the environment to recognize my nvidia GPU for some reason
os.environ['CUDA_LAUNCH_BLOCKING'] = '1' # to debug by having all CUDA functions executed in place
torch.set_default_tensor_type('torch.cuda.FloatTensor')
# Preprocess image
tfms = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),])
img = tfms(Image.open('source_image.bmp')).unsqueeze(0).cuda()
model = torchvision.models.resnet50(pretrained=True).cuda()
model.eval() # we put the model in evaluation mode, to prevent storage of gradient which might accumulate
nvmlInit()
h = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(h)
print(f'Total available memory : {info.total / 1000000000}')
feature_extractor = torch.nn.Sequential(*list(model.children())[:-1])
orig_embedding = feature_extractor(img)
embedding_depth = 2048
mem0 = 0
embedding = torch.zeros(2048, img.shape[2], img.shape[3]) #, dtype=torch.float)
patch_size=[4,4]
patch_stride=[2,2]
patch_value=0.0
# Here, we iterate over the patch placement, defined at the top left location
for row in range(img.shape[2]-1):
for col in range(img.shape[3]-1):
print("######################################################")
######################################################
# Isolated line, culprit 1 of the GPU memory leak
######################################################
patched_embedding = feature_extractor(img)
delta_embedding = (patched_embedding - orig_embedding).view(-1, 1, 1)
######################################################
# Isolated line, culprit 2 of the GPU memory leak
######################################################
embedding[:,row:row+1,col:col+1] = torch.add(embedding[:,row:row+1,col:col+1], delta_embedding)
print("img size:\t\t", img.element_size() * img.nelement())
print("patched_embedding size:\t", patched_embedding.element_size() * patched_embedding.nelement())
print("delta_embedding size:\t", delta_embedding.element_size() * delta_embedding.nelement())
print("Embedding size:\t\t", embedding.element_size() * embedding.nelement())
del patched_embedding, delta_embedding
torch.cuda.empty_cache()
info = nvmlDeviceGetMemoryInfo(h)
print("\nMem usage increase:\t", info.used / 1000000000 - mem0)
mem0 = info.used / 1000000000
print(f'Free:\t\t\t {(info.total - info.used) / 1000000000}')
print("Done.")
Add this to your code as soon as you load the model
for param in model.parameters():
param.requires_grad = False
from https://pytorch.org/docs/stable/notes/autograd.html#excluding-subgraphs-from-backward