Remove the boxes but keep the labels for a YOLOv8 prediction

I am working on a project where I have trained a model using YOLO (from the Ultralytics library - version: 8.0.132) to detect specific objects in images. My classification categories are [A, B, C, D].

In my Jupyter Notebook, I have the following code:

from ultralytics import YOLO

model_path = "{pathToModel}/best.pt"
print("Loading model...")
model = YOLO(model_path)

result = model.predict("{pathToImage}.png", conf=0.3, save=True)
print(result)

# Checking the presence of predictions
for r in result:
    print(r.masks)

This successfully saves an image showing the predicted segments with bounding boxes and probabilities (e.g., category B with a probability of 0.61).

However, when I modify the predict method (by reading through the documentation: https://docs.ultralytics.com/de/modes/predict/) to exclude bounding boxes (boxes=False), the saved image shows the segments without the boxes and crucially, without labels or probabilities.

Attempting to include labels and probabilities with labels=True (same with show_labels) and probs=True results in the following error:

Traceback (most recent call last):

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\IPython\core\interactiveshell.py:3442 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[39], line 1
    result = model.predict("{pathToMyImage}.png" , conf = 0.3, save = True, boxes=False, labels=True, probs=True)

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context
    return func(*args, **kwargs)

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\yolo\engine\model.py:249 in predict
    self.predictor = TASK_MAP[self.task][3](overrides=overrides, _callbacks=self.callbacks)

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\yolo\v8\segment\predict.py:13 in __init__
    super().__init__(cfg, overrides, _callbacks)

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\yolo\engine\predictor.py:86 in __init__
    self.args = get_cfg(cfg, overrides)

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\yolo\cfg\__init__.py:114 in get_cfg
    check_cfg_mismatch(cfg, overrides)

  File c:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\yolo\cfg\__init__.py:187 in check_cfg_mismatch
    raise SyntaxError(string + CLI_HELP_MSG) from e
...
    Docs: https://docs.ultralytics.com
    Community: https://community.ultralytics.com
    GitHub: https://github.com/ultralytics/ultralytics

Seems like the combination of

show_boxes=False,
show_conf=True, 
show_labels=True

does not work out (what is described in here: Unable to hide bounding boxes and labels in YOLOv8). show_boxes throws and error and boxes more or less overrides everything I set as an argument.

I followed the documentation but am struggling to display both the segment and its associated label/probability without the bounding box. Any insights or suggestions on how to achieve this would be greatly appreciated.

Solution

I wrote a small script in python to draw in the polygons correctly and showing the labels and confidence values. But this is a workaround for me. If there is a simpler solution in the arguments (as mentioned above) feel free to add your solution.

The code:

from torchvision.transforms import functional as F
import cv2

prediction_results = result[0]

# Convert the original image to a NumPy array and convert to RGB
original_image = Image.open(image_path)
if original_image.mode != 'RGB':
    original_image = original_image.convert('RGB')
original_image = np.array(original_image)
display_image = original_image.copy()

if prediction_results.masks is not None:
    # Run through each detection and application of color coding as well as adding text
    for i, (mask_tensor, box) in enumerate(zip(prediction_results.masks.data, prediction_results.boxes)):
        # Scaling the mask to the size of the original image
        resized_mask = F.resize(mask_tensor.unsqueeze(0), display_image.shape[:2], antialias=True).squeeze() > 0

        # Applying the color marker to the mask areas for each color channel
        display_image[resized_mask, 0] = 255  # Red channel
        display_image[resized_mask, 1] = 0    # Green channel
        display_image[resized_mask, 2] = 0    # Blue channel

        # Extract the class ID and confidence from the box
        class_id = int(box.cls[0])  # Access to the class ID
        confidence = box.conf[0]    # Access to the confidence
        class_name = prediction_results.names[class_id]  # Class name based on the ID

        # Adding the text
        label = f"{class_name}: {confidence:.2f}"
        coords = box.xyxy.cpu().numpy()  # Convert to a NumPy array
        x1, y1 = int(coords[0][0]), int(coords[0][1])  # Extract x1 and y1
        cv2.putText(display_image, label, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
else:
    print("No masks found or mask recognition not supported.")

# Convert to a PIL image and display
display_image = Image.fromarray(display_image)
plt.imshow(display_image)
plt.axis('off')
plt.show()