I am facing the famous "CUDA out of memory" error.
File "DATA\instance-mask-r-cnn-torch\venv\lib\site-packages\torchvision\models\detection\roi_heads.py", line 416, in paste_mask_in_image
im_mask = torch.zeros((im_h, im_w), dtype=mask.dtype, device=mask.device)
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 2.00 GiB total capacity; 1.66 GiB already allocated; 0 bytes free; 1.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Windows 10, CUDA 11.3, torch 0.11.0+cu113, torchvision 0.12.0+cu113
On the ENV I played with PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32
, 128, 8, 24, 32... without success.
An image of size 640x512 (1.5mb) works, another one of size 3264x1840 (1.75mb) leads to an OOME.
import torchvision.transforms
from torchvision.models.detection import mask_rcnn
import torch
from PIL import Image
import gc
if torch.cuda.is_available():
print(f'GPU: {torch.cuda.get_device_name(0)}')
device = torch.device('cuda')
torch.cuda.empty_cache()
else:
device = torch.device('cpu')
print(f'Device: {device}')
model = mask_rcnn.maskrcnn_resnet50_fpn(pretrained=True)
print(model.eval())
model.to(device)
img_path = 'images/tv_image05.png'
img_path = 'images/DJI_20220519110029_0001_W.JPG'
img_path = 'images/DJI_20220519110143_0021_T.JPG'
img_path = 'images/WP_20160104_09_52_53_Pro.jpg'
img = Image.open(img_path).convert("RGB")
img_tensor = torchvision.transforms.functional.to_tensor(img)
with torch.no_grad():
predictions = model([img_tensor.cuda()])
print(predictions)
gc.collect()
torch.cuda.empty_cache()
So far i found lots of hints, reducing batch size. But I am not using training mode. what else can I do to be able to process images of sizes up to 7mb?
The 3264x1840 image is going to be 72MB in float32. Since it works for your 640x512 image, I'd suggest resizing it.
Simply add torchvision.transforms.functional.resize(img,512)
Another common trick is to quantize the model and the image to float16 but this may degrade the model accuracy depending on what you're doing.