computer-vision object-detection yolo darknet

How to do Batch Detection on Darknet architecture?

I am trying to do a batch detection using the Darknet\YoloV4. It works for one batch, then the second batch fails with CUDA error. Am I missing something else on below snippet? And what are the right parameters for Batch for RTX GPU card, How to determine the right Batch size?

My system configuration is like below -

System:    Host: ubox Kernel: 5.4.0-42-generic x86_64 bits: 64
           Desktop: Gnome 3.28.4 Distro: Ubuntu 18.04.4 LTS
Machine:   Device: desktop System: Alienware product: Alienware Aurora R9 v: 1.0.7 serial: N/A
           Mobo: Alienware model: 0T76PD v: A01 serial: N/A
           UEFI: Alienware v: 1.0.7 date: 12/23/2019
CPU:       8 core Intel Core i7-9700K (-MCP-) cache: 12288 KB
           clock speeds: max: 4900 MHz 1: 800 MHz 2: 800 MHz 3: 800 MHz
           4: 800 MHz 5: 801 MHz 6: 803 MHz 7: 808 MHz 8: 810 MHz
Graphics:  Card-1: Intel Device 3e98
           Card-2: NVIDIA Device 1e84
           Display Server: x11 (X.Org 1.20.8 )
           drivers: modesetting,nvidia (unloaded: fbdev,vesa,nouveau)
           Resolution: 2560x1440@59.95hz, 1920x1080@60.00hz
           OpenGL: renderer: GeForce RTX 2070 SUPER/PCIe/SSE2
           version: 4.6.0 NVIDIA 450.57

I am getting CUDA Error: out of memory when I perform performBatchDetectV2() of batch size 3.

How to do Batching on Yolov4 architecture properly? In my use case, I am getting frames from the camera, I want to batch 10 frames into one and call below. The below function works perfectly if I call just once, meaning it throws Cuda error on the second batch of frames.

def performBatchDetectV2(image_list, thresh= 0.25, configPath = "./cfg/yolov4.cfg", weightPath = "yolov4.weights", metaPath= "./cfg/coco.data", hier_thresh=.5, nms=.45, batch_size=3):
    net = load_net_custom(configPath.encode('utf-8'), weightPath.encode('utf-8'), 0, batch_size)
    meta = load_meta(metaPath.encode('utf-8'))
    pred_height, pred_width, c = image_list[0].shape
    net_width, net_height = (network_width(net), network_height(net))
    img_list = []
    for custom_image_bgr in image_list:
        custom_image = cv2.cvtColor(custom_image_bgr, cv2.COLOR_BGR2RGB)
        custom_image = cv2.resize(
            custom_image, (net_width, net_height), interpolation=cv2.INTER_NEAREST)
        custom_image = custom_image.transpose(2, 0, 1)
        img_list.append(custom_image)

    arr = np.concatenate(img_list, axis=0)
    arr = np.ascontiguousarray(arr.flat, dtype=np.float32) / 255.0
    data = arr.ctypes.data_as(POINTER(c_float))
    im = IMAGE(net_width, net_height, c, data)

    batch_dets = network_predict_batch(net, im, batch_size, pred_width,
                                                pred_height, thresh, hier_thresh, None, 0, 0)
    batch_boxes = []
    batch_scores = []
    batch_classes = []
    for b in range(batch_size):
        num = batch_dets[b].num
        dets = batch_dets[b].dets
        if nms:
            do_nms_obj(dets, num, meta.classes, nms)
        boxes = []
        scores = []
        classes = []
        for i in range(num):
            det = dets[i]
            score = -1
            label = None
            for c in range(det.classes):
                p = det.prob[c]
                if p > score:
                    score = p
                    label = c
            if score > thresh:
                box = det.bbox
                left, top, right, bottom = map(int,(box.x - box.w / 2, box.y - box.h / 2,
                                            box.x + box.w / 2, box.y + box.h / 2))
                boxes.append((top, left, bottom, right))
                scores.append(score)
                classes.append(label)
                # boxColor = (int(255 * (1 - (score ** 2))), int(255 * (score ** 2)), 0)
                # cv2.rectangle(image_list[b], (left, top),
                #           (right, bottom), boxColor, 2)
        # cv2.imwrite(os.path.basename(img_samples[b]),image_list[b])

        batch_boxes.append(boxes)
        batch_scores.append(scores)
        batch_classes.append(classes)
    free_batch_detections(batch_dets, batch_size)
    return batch_boxes, batch_scores, batch_classes

Solution

The problem is, every time you did loaded the network when you call the performBatchDetectV2 method. So you have to load the network one time with constant batch size and use your loaded network for the prediction. set your netwrok variable as global. Because you are

This is your function

def performBatchDetectV2(image_list, thresh= 0.25, configPath = "./cfg/yolov4.cfg", weightPath = "yolov4.weights", metaPath= "./cfg/coco.data", hier_thresh=.5, nms=.45, batch_size=3):
    net = load_net_custom(configPath.encode('utf-8'), weightPath.encode('utf-8'), 0, batch_size)
    meta = load_meta(metaPath.encode('utf-8'))
    <your code>

Change to this

configPath = "./cfg/yolov4.cfg"
weightPath = "yolov4.weights"
metaPath= "./cfg/coco.data",
net = load_net_custom(configPath.encode('utf-8'), weightPath.encode('utf-8'), 0, batch_size)
meta = load_meta(metaPath.encode('utf-8'))
def performBatchDetectV2(image_list, thresh= 0.25, hier_thresh=.5, nms=.45, batch_size=3):
    global net, meta
    <your code>