Search code examples
pythonpytorchtorchvision

How to avoid "RuntimeError: CUDA out of memory." during inference of one single image?


I am facing the famous "CUDA out of memory" error.

File "DATA\instance-mask-r-cnn-torch\venv\lib\site-packages\torchvision\models\detection\roi_heads.py", line 416, in paste_mask_in_image
    im_mask = torch.zeros((im_h, im_w), dtype=mask.dtype, device=mask.device)
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 2.00 GiB total capacity; 1.66 GiB already allocated; 0 bytes free; 1.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Windows task manager

Windows 10, CUDA 11.3, torch 0.11.0+cu113, torchvision 0.12.0+cu113

On the ENV I played with PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32, 128, 8, 24, 32... without success.

An image of size 640x512 (1.5mb) works, another one of size 3264x1840 (1.75mb) leads to an OOME.

import torchvision.transforms
from torchvision.models.detection import mask_rcnn
import torch
from PIL import Image
import gc

if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
    device = torch.device('cuda')
    torch.cuda.empty_cache()
else:
    device = torch.device('cpu')

print(f'Device: {device}')

model = mask_rcnn.maskrcnn_resnet50_fpn(pretrained=True)
print(model.eval())

model.to(device)

img_path = 'images/tv_image05.png'
img_path = 'images/DJI_20220519110029_0001_W.JPG'
img_path = 'images/DJI_20220519110143_0021_T.JPG'
img_path = 'images/WP_20160104_09_52_53_Pro.jpg'

img = Image.open(img_path).convert("RGB")

img_tensor = torchvision.transforms.functional.to_tensor(img)

with torch.no_grad():
    predictions = model([img_tensor.cuda()])

print(predictions)

gc.collect()
torch.cuda.empty_cache()

So far i found lots of hints, reducing batch size. But I am not using training mode. what else can I do to be able to process images of sizes up to 7mb?


Solution

  • The 3264x1840 image is going to be 72MB in float32. Since it works for your 640x512 image, I'd suggest resizing it. Simply add torchvision.transforms.functional.resize(img,512)

    Another common trick is to quantize the model and the image to float16 but this may degrade the model accuracy depending on what you're doing.