Search code examples
pythontensorflowimage-processingobject-detection-api

Crop and Select Only the Detected Region from an Image in Python


I have used Tensorflow Object Detection API to detect hands from images. By using the provided sample code (object_detection_tutorial.ipynb) I have been able to draw bounding boxes on images. Is there any way to select only the detected region (which is inside a bounding box) and get it as an image?

For example,

Sample Input Image

enter image description here

Tensorflow Output

enter image description here

What I Want

enter image description here enter image description here

Object detection API sample code can be found here. https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb

Any help would be highly appreciated!


Solution

  • Yes, in the tutorial the variable output_dict can be used to achieve that. Notice all the variables passed into function vis_util.visualize_boxes_and_labels_on_image_array, they contain the boxes, scores, etc.

    First you need to get the image shape as the box coordinates are in normalized form.

    img_height, img_width, img_channel = image_np.shape
    

    Then transform all the box coordinates to the absolute format

    absolute_coord = []
    THRESHOLD = 0.7 # adjust your threshold here
    N = len(output_dict['detection_boxes'])
    for i in range(N):
        if output_dict['score'][i] < THRESHOLD:
            continue
        box = output_dict['detection_boxes']
        ymin, xmin, ymax, xmax = box
        x_up = int(xmin*img_width)
        y_up = int(ymin*img_height)
        x_down = int(xmax*img_width)
        y_down = int(ymax*img_height)
        absolute_coord.append((x_up,y_up,x_down,y_down))
    

    Then you can use numpy slices to get the image area within the bounding box

    bounding_box_img = []
    for c in absolute_coord:
        bounding_box_img.append(image_np[c[1]:c[3], c[0]:c[2],:])
    

    Then just save all the numpy arrays in bounding_box_img as images. When saving you might need to change the shape as the img is in shape [img_height, img_width, img_channel]. Also you can even filter out all detections with low confidence scores if you use the score array.

    PS: i might have messed up with img_height and img_width but these should give you a starting point.