python computer-vision object-detection yolo yolov8

How do I work with the result of model.predict in yolov8

I have my webcam set up to be the input for my model.predict() function and want to trigger some code if the function detects a certain object. The model.predict() function does not seem to ever terminate when using a webcam however, making this not possible. Just wondering what a solution to this could be.

from ultralytics import YOLO
from ultralytics.yolo.v8.detect.predict import DetectionPredictor
import cv2
print('hi')

model = YOLO("C:/Users/User/Downloads/best.pt")
outs = model.predict(source="0", show=True)

print('hey')
# hi gets printed but not hey

If i include the paramater verbose=true in the predict function, the information I need is printed to the terminal, but I do not know how to access this in a variable to trigger more code. Perhaps multi-threading could help but surely there would be a simpler method?

Solution

The problem is not in your code, the problem is in the hydra package used inside the Ultralytics package.

It is treating "0" passed to "source" as a null value, thus not getting any input and predicts on the default assets. if you tried it with any local image or an image on the web, the code will work normally.

You can try this work around:

model = YOLO("model.pt")
camera = cv2.VideoCapture(0)
img_counter = 0

while True:
    ret, frame = camera.read()

    if not ret:
        print("failed to grab frame")
        break
    cv2.imshow("test", frame)

    k = cv2.waitKey(1)
    if k%256 == 27:
        # ESC pressed
        print("Escape hit, closing...")
        break
    elif k%256 == 32:
        # SPACE pressed
        img_path = "path/opencv_frame_{}.png".format(img_counter)
        cv2.imwrite(img_path, frame)
        outs = model.predict(img_path)
        img_counter += 1

camera.release()

So what we are doing here, is we are trying to write the image to a file and then infering on that file.

You can try the following if you wanna save on detection:

inputs = [frame]  # or if you have multiple images [frame1, frame2, etc.]
results = model(inputs)  # list of Results objects -> perform inference using the model

if results:
    cv2.imwrite(img_path, frame)
for result in results:
    boxes = result.boxes  # Boxes object for bbox outputs 
# Do something with the bounding boxes