I used detectron2 to get predictions of where an object is located in an image. Now I'm trying to use the prediction boxes to crop the image (in my use case there is only 1 object/box detected per image). The part of my code that's relevant to my question is below. The issue is it's only cropping the left side of the image but I need it to (obviously) crop the top, right and bottom too so it crops to the shape of the detected object. The original images are of the shape (x, y, 3) so they are RGB images. What am I missing?
from detectron2.utils.visualizer import ColorMode
import glob
imageName = "my_img.jpg"
im = cv2.imread(imageName)
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1], metadata=test_metadata, scale=0.8)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])
boxes = outputs["instances"].pred_boxes
boxes = list(boxes)[0].detach().cpu().numpy()
# extract the bounding box coordinates
(x, y) = (int(boxes[0]), int(boxes[1]))
(w, h) = (int(boxes[2]), int(boxes[3]))
crop_img = image[x:y+h, y:x+w]
cv2_imshow(crop_img)
I also tried the following but it trimmed too much of the image from the top and didn't trim the right or bottom of the image at all.
from detectron2.data.transforms import CropTransform
ct = CropTransform(x, y, w, h)
crop_img = ct.apply_image(image)
cv2_imshow(crop_img)
Playing around with it, I was able to crop the image around the detected box with the following but it's not ideal since I had to hardcode it.
crop_img = image[y-40:y+h-390, x:x+w-395]
The following should work.
def crop_object(image, box):
"""Crops an object in an image
Inputs:
image: PIL image
box: one box from Detectron2 pred_boxes
"""
x_top_left = box[0]
y_top_left = box[1]
x_bottom_right = box[2]
y_bottom_right = box[3]
x_center = (x_top_left + x_bottom_right) / 2
y_center = (y_top_left + y_bottom_right) / 2
crop_img = image.crop((int(x_top_left), int(y_top_left), int(x_bottom_right), int(y_bottom_right)))
return crop_img
# Get pred_boxes from Detectron2 prediction outputs
boxes = outputs["instances"].pred_boxes
# Select 1 box:
box = list(boxes)[0].detach().cpu().numpy()
# Crop the PIL image using predicted box coordinates
crop_img = crop_object(image, box)