Search code examples
pythonmatplotlibcoordinatesrectanglesbounding-box

How to make sense of Open Images Dataset's bounding-box annotations?


So I downloaded the Open Images Dataset via TensorFlow Datasets (https://www.tensorflow.org/datasets). I can view the images and annotations fine but I can't make sense of the weird format they are using for the object bounding boxes.

For example: I have an image showing an elephant with a width of 682 and a height of 1024. The bounding box coordinates of the elephant are: [0.03875 , 0.188732, 0.954375, 0.979343]. According to the documentation the 4 numbers represent xMin, xMax, yMin, yMax.

How do I display this weirdly small rectangle with, let's say matplotlib?
I already tried multiplying the coordinates with the width and height respectively but the resulting rectangles don't make any sense. I also switched the values for x_1 and x_2 etc. around but that didn't work either.

This is my code:

for e in train_data:

    np_img = e["image"]

    height = np.shape(np_img)[0]
    width = np.shape(np_img)[1]

    fig, ax = plt.subplots(1)

    ax.imshow(np_img)

    for bbox in e["bobjects"]["bbox"]:

        x_1 = bbox[0]
        x_2 = bbox[1]

        y_1 = bbox[2]
        y_2 = bbox[3]

        rect = patches.Rectangle((x_1 * width, y_2 * height), (x_2 * width - x_1 * width), (y_1 * height - y_2 * height), linewidth=1, edgecolor='r', facecolor='none')

        ax.add_patch(rect)

    plt.show()

    # Only one iteration for testing
    break

Solution

  • I found the solution myself: As it turns out, when using Open Images from the TensorFlow Datasets API the coordinates for the bounding boxes are in a different order than the ones documented on the dataset's website.
    On there, they described the order of the four values for each box as follows:
    xMin, xMax, yMin, yMax.
    However, the order for the TF Datasets API is yMin, xMin, yMax, xMax. I found this out by comparing the image IDs from a single image with the annotations.csv file from the website. The only step left to get the absolute value for the boxes is to multiply the x values with the width of the image and the y values with its height.