I'm trying to train YOLOR on coco128 dataset in Google Colab on coco128 dataset. The training set contains 112 images. The validation set contains 8 images. The testing set contains 8 images.
But, it throws cuda out of memory error. How could it be?? the dataset has only 128 images in total.
Using torch 1.7.0 CUDA:0 (Tesla T4, 15109MB)
Namespace(adam=False, batch_size=8, bucket='', cache_images=False, cfg='cfg/yolor_p6.cfg', data='data/coco128.yaml', device='0', epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='./data/hyp.scratch.1280.yaml', image_weights=False, img_size=[1280, 1280], local_rank=-1, log_imgs=16, multi_scale=False, name='yolor_p6', noautoanchor=False, nosave=False, notest=False, project='runs/train', rect=False, resume=False, save_dir='runs/train/yolor_p613', single_cls=False, sync_bn=False, total_batch_size=8, weights='', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/
2021-07-29 13:35:48.259076: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0}
Model Summary: 665 layers, 37265016 parameters, 37265016 gradients, 81.564040600 GFLOPS
Optimizer groups: 145 .bias, 145 conv.weight, 149 other
Scanning labels ../coco128/train2017.cache3 (110 found, 0 missing, 2 empty, 0 duplicate, for 112 images): 112it [00:00, 11214.18it/s]
Scanning labels ../coco128/val2017.cache3 (8 found, 0 missing, 0 empty, 0 duplicate, for 8 images): 8it [00:00, 4100.00it/s]
NumExpr defaulting to 2 threads.
Image sizes 1280 train, 1280 test
Using 2 dataloader workers
Logging results to runs/train/yolor_p613
Starting training for 300 epochs...
Epoch gpu_mem box obj cls total targets img_size
0% 0/14 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 539, in <module>
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 289, in train
pred = model(imgs) # forward
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/MyDrive/YOLOR/yolor/models/models.py", line 543, in forward
return self.forward_once(x)
File "/content/drive/MyDrive/YOLOR/yolor/models/models.py", line 604, in forward_once
x = module(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 394, in forward
return F.silu(input, inplace=self.inplace)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1741, in silu
return torch._C._nn.silu(input)
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 14.76 GiB total capacity; 13.70 GiB already allocated; 67.75 MiB free; 13.76 GiB reserved in total by PyTorch)
0% 0/14 [00:03<?, ?it/s]
vRAM usage has nothing to do with how many train/val examples there are, but rather model, image size, and batch size. 1280x1280 is a massive image size - on a 16gb GPU you will probably only be able to train at 1 or 2 batch size.
Either use a lower resolution/smaller model, a GPU with more vRAM, or decrease your batch size.
Also try NVIDIA AMP