YOLOv8 Model Classes PREDICTION by frames of a Video

I need some help as I will be needing this to work for my final thesis project for the Model has 2 classes inheat and non-inheat, I did the codes here below but when it tries to predict or detect by frames of the video it detects the non-inheat but it only stacks in the ininheat frames instead of the non_ inheat frames I might be doing a wrong process here for by frames in stacking them up even its detecting non-inheat it stacks to the in_heat frames.

codes:

def process_video_with_second_model(video_path):
    cap = cv2.VideoCapture(video_path)
    class_counts = {'inheat': 0, 'non-inheat': 0}

    in_heat_frames = []
    non_in_heat_frames = []

    while True:
        ret, frame = cap.read()
        if frame is None:
            break # Break the loop when no more frames are available

        # Resize the frame to a smaller size (e.g., 400x400)
        frame_small = cv2.resize(frame, (400, 400))

        # Use the second model to detect in-heat behavior
        results_in_heat = yolov8_model_in_heat.predict(source=frame_small, show=True, conf=0.8)

        # Print results to inspect structure
        for results_in_heat_instance in results_in_heat:
            # Access bounding box coordinates
            boxes = results_in_heat_instance.boxes

            # CONFIDENCE 0.5
            if len(boxes) > 0:
                class_name = results_in_heat_instance.names[0]

                # Use a dictionary to store the counts for each class
                class_counts[class_name] += 1

                # Add the frame to the corresponding list based on the class name
                if class_name == 'non-inheat':
                    non_in_heat_frames.append(frame)
                elif class_name == 'inheat':
                    in_heat_frames.append(frame)

        print(f"Class Counts: {class_counts}")

        # Check if either condition is met (50 frames for inheat and 50 frames for non-inheat)
        if class_counts['inheat'] >= 50 and class_counts['non-inheat'] >= 50:
            break

    # Release resources for the second model
    cap.release()
    cv2.destroyAllWindows()

    # Stack the in-heat and non-in-heat frames vertically
    stacked_in_heat_frames = np.vstack(in_heat_frames)
    stacked_non_in_heat_frames = np.vstack(non_in_heat_frames)

    # Display the stacked in-heat and non-in-heat frames
    cv2.imshow('Stacked In-Heat Frames', stacked_in_heat_frames)
    cv2.imshow('Stacked Non-In-Heat Frames', stacked_non_in_heat_frames)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    # Compare the counts and return the label with the higher count
    if class_counts['inheat'] > class_counts['non-inheat']:
        return 'inheat'
    elif class_counts['non-inheat'] > class_counts['inheat']:
        return 'non-inheat'

I did read the YOLOv8 Documents for predicting but still cant do it. I did this in the Pycharm IDE

Solution

The problem is in this line: class_name = results_in_heat_instance.names[0]. Look at the result's names object: it is a full dictionary of your model names, it will be the same no matter what the model has detected in a frame. To get a class name for every detected object in a frame, you need to iterate through the boxes and get a cls value of every box object, which will be a detected class index from the above-mentioned names dictionary. The sample will look like this:

results = model.predict(frame)
for result in results:
    for box in result.boxes:
        class_id = int(box.cls.item())
        class_name = result.names[class_id]