python python-3.x object-detection yolo yolov8

Obtaining detected object names using YOLOv8

We are trying to get the detected object names using Python and YOLOv8 with the following code.

import cv2
from ultralytics import YOLO


def main():
    cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

    model = YOLO("yolov8n.pt")

    while True:
        ret, frame = cap.read()
        result = model(frame, agnostic_nms=True)[0]

        print(result)

        if cv2.waitKey(30) == 27:
            break

    cap.release()
    cv2.destroyAllWindows()


if __name__ == "__main__":
    main()

The following two types are shown on the log.

0: 384x640 1 person, 151.2ms
Speed: 0.6ms preprocess, 151.2ms inference, 1.8ms postprocess per image at shape (1, 3, 640, 640)

The second log is the one we displayed using print, how do we get the person from now on? Presumably we get the person by giving 0 to the names, but where do we get the 0 from?

ultralytics.yolo.engine.results.Results object with attributes:

boxes: ultralytics.yolo.engine.results.Boxes object
keypoints: None
keys: ['boxes']
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
orig_img: array([[[51, 58, 64],
        [52, 59, 65],
        [54, 59, 65],
        ...,
        [64, 68, 74],
        [62, 67, 73],
        [62, 67, 73]],

       [[51, 58, 64],
        [53, 59, 65],
        [54, 59, 65],
        ...,
        [63, 68, 74],
        [62, 67, 73],
        [62, 67, 73]],

       [[53, 58, 64],
        [53, 58, 64],
        [53, 58, 64],
        ...,
        [61, 67, 73],
        [61, 67, 73],
        [61, 67, 73]],

       ...,

       [[43, 48, 58],
        [42, 47, 57],
        [41, 46, 56],
        ...,
        [24, 35, 49],
        [23, 34, 48],
        [23, 34, 48]],

       [[44, 48, 59],
        [43, 47, 57],
        [42, 46, 56],
        ...,
        [26, 35, 49],
        [26, 35, 49],
        [24, 33, 48]],

       [[45, 48, 59],
        [43, 45, 56],
        [40, 43, 54],
        ...,
        [26, 35, 49],
        [26, 35, 49],
        [25, 33, 48]]], dtype=uint8)
orig_shape: (720, 1280)
path: 'image0.jpg'
probs: None
speed: {'preprocess': 1.6682147979736328, 'inference': 79.47301864624023, 'postprocess': 1.0020732879638672}

We would like to know the solution in this way. But if it is not possible, we can use another method if it is a combination of Python and YOLOv8. We plan to display bounding boxes and object names.

Additional Information

I changed the code as follows.

        ret, frame = cap.read()
        # result = model(frame, agnostic_nms=True)[0]
        result = model([frame])[0]

        boxes = result.boxes
        masks = result.masks
        probs = result.probs

        print("[boxes]==============================")
        print(boxes)
        print("[masks]==============================")
        print(masks)
        print("[probs]==============================")
        print(probs)

After all, the following person is not included. How should we determine that?

[boxes]==============================
WARNING ⚠️ 'Boxes.boxes' is deprecated. Use 'Boxes.data' instead.
ultralytics.yolo.engine.results.Boxes object with attributes:

boxes: tensor([[4.7356e+01, 7.2858e+00, 1.1974e+03, 7.1092e+02, 8.6930e-01, 0.0000e+00]])
cls: tensor([0.])
conf: tensor([0.8693])
data: tensor([[4.7356e+01, 7.2858e+00, 1.1974e+03, 7.1092e+02, 8.6930e-01, 0.0000e+00]])
id: None
is_track: False
orig_shape: tensor([ 720, 1280])
shape: torch.Size([1, 6])
xywh: tensor([[ 622.4028,  359.1004, 1150.0942,  703.6293]])
xywhn: tensor([[0.4863, 0.4988, 0.8985, 0.9773]])
xyxy: tensor([[  47.3557,    7.2858, 1197.4500,  710.9150]])
xyxyn: tensor([[0.0370, 0.0101, 0.9355, 0.9874]])
[masks]==============================
None
[probs]==============================
None

Solution

There are probably better solutions to this, but I couldn't really find anything useful either, so I did this:

while True:
    ret, frame = cap.read()
    results = model(frame, agnostic_nms=True)[0]

    if not results or len(results) == 0:
        continue

    for result in results:

        detection_count = result.boxes.shape[0]

        for i in range(detection_count):
            cls = int(result.boxes.cls[i].item())
            name = result.names[cls]
            confidence = float(result.boxes.conf[i].item())
            bounding_box = result.boxes.xyxy[i].cpu().numpy()

            x = int(bounding_box[0])
            y = int(bounding_box[1])
            width = int(bounding_box[2] - x)
            height = int(bounding_box[3] - y)