Search code examples
pythonpython-3.xtensorflowobject-detectionobject-detection-api

How do I get the pixel coordinates for a specific bounding box


I'm trying to get the pixel coordinates of the bounding boxes from the person class(labeled as: mscoco_label_map.pbtxt

item {
  name: "/m/01g317"
  id: 1
  display_name: "person"
}

Currently I'm getting the bounding box and label onto the image via

input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
detections, predictions_dict, shapes = detect_fn(input_tensor)
label_id_offset = 1
image_np_with_detections = image_np.copy()

viz_utils.visualize_boxes_and_labels_on_image_array(
          image_np_with_detections,
          detections['detection_boxes'][0].numpy(),
          (detections['detection_classes'][0].numpy() + label_id_offset).astype(int),
          detections['detection_scores'][0].numpy(),
          category_index,
          use_normalized_coordinates=True,
          max_boxes_to_draw=3,
          min_score_thresh=.30,
          agnostic_mode=False)

(All code is inside while loop)

But when I print out detections['detection_boxes'] I get so many normalized coordinates and I don't know how to sort those coordinates to the specic box, eg. the person label.

So how can I get the pixel coordinates of a specific bounding box in detections['detection_boxes']?

New to StackOverflow, so any tips are greatly appreciated.


Solution

  • So detection_boxes should be an N by 4 array of bounding box co-ordinates in the form [ymin, xmin, ymax, xmax] in normalised co-ordinates, and detection_classes should be an array of (`float?) numeric class labels. I'm assuming they haven't changed the API much because I haven't used the object detection API since early last year.

    You should be able to do something like this to convert to pixel co-ordinates and then get just on set of labels.

    detection_boxes = detections['detection_boxes'][0].numpy()
    detection_classes = detections['detection_classes'][0].numpy().astype(int) + label_id_offset
    detection_scores = detections['detection_scores'][0].numpy()
    
    # Scale to pixel co-ordinates
    detection_boxes[:, (0, 2)] *= IMAGE_HEIGHT
    detection_boxes[:, (1, 3)] *= IMAGE_WIDTH
    
    # Select person boxes
    cond = (detection_classes == PERSON_CLASS_ID) & (detection_scores >= SCORE_THRESH)
    person_boxes = detection_boxes[cond, :]
    person_boxes = np.round(person_boxes).astype(int)