Search code examples
pythonopencvgpuobject-detectionyolo

How do I run my YOLOv3 algorithm with OpenCV in Visual Studio 2022 using GPU?


I am new to computer vision and object detection and I have came across this exercise on performing object detection on videos with OpenCV, pre-trained YOLOv3 model and the coco dataset. I am currently using visual studio 2022 with the lastest opencv-python and the needed files such as the weight, cfg and name files. The algorithm works, however, the processing speed when displaying the output video has a low fps. I searched online and most of them suggested of using a GPU.

In my current setup, I have a CPU: Intel(R) Core(TM) i5-4570 @3.20GHz and GPU: NVDIA GeForce GTX 650 Ti BOOST

I am planning on making a custom dataset for training later on. Is there anyway I can perform my object detection and training using the GPU on visual studio 2022? If so, what procedures and steps do I need to do in order to do so? Thanks in advance.

python code:

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet("YOLO_COCO/yolov3.weights", "YOLO_COCO/yolov3.cfg")
classes = []
with open("YOLO_COCO/coco.names","r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# Load Video
cap = cv2.VideoCapture('videos/overpass.mp4')

while True:
    _, img = cap.read()

    height, width, channel = img.shape

    # Detecting Objects
    blob = cv2.dnn.blobFromImage(img, 1/255, (416,416), (0, 0, 0), swapRB=True, crop=False) 

    net.setInput(blob)
    outs = net.forward(output_layers)


    # Show information on the screen
    # Initialise variables
    class_ids = []
    confidences = [] 
    boxes = []

    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence>0.5:
                # Object detected
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)

                # Rectangle coordinates
                x = int (center_x - w/2)
                y = int (center_y - h/2)

                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    # Non Maximum Suppresion (Keep one box)
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

    # Label object
    font = cv2.FONT_HERSHEY_PLAIN
    for i in range (len(boxes)):
        if i in indexes:
            x, y, w, h = boxes[i]
            label = str(classes[class_ids[i]])
            confidence = str(round(confidences[i],2))
            color = colors[i]
            cv2.rectangle(img, (x,y), (x+w, y+h), color, 2)
            cv2.putText(img, label + "  " + confidence, (x, y+20), font, 1, color, 2)

    cv2.imshow("Video", img)
    key = cv2.waitKey(1)
    if key == 27:
        break

cap.release()
cv2.destroyAllWindows()

Solution

  • Step 1

    Make sure your OpenCV already bind with CUDA. If you don't have it, you can check this because you're using Visual Studio but thats for Windows.

    If you are using linux, you can check here

    Step 2

    put this code before start the loop

    net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
    net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
    

    Tadaa, you're done!