Is the IOU in Tensorflow Object Detection API wrong?

I just digged a bit through the Tensorflow Object Detection API code especially the eval_util part, as I wanted to implement the COCO metrics.

But I noticed that the metrics are solely calculated using the bounding boxes which have normalized coordinates between [0, 1]. There are no aspect ratios or absolute coordinates used.

So, doesn't this mean that the intersection over unions calculated on these results are incorrect? Let's take an 200x100 image pixel as an example. If the box would be off by 20px to the left, that's 0.1 in normalized coordinates. But if it would be off by 20px to the top, that would be 0.2 in normalized coordinates.

Doesn't that mean, being off to the top is harder penalizing the score than being off to the side?

Solution

I believe the predicted coordinates are resized to the absolute image coordinates in the eval binary.

But the other thing I would say is that IOU is scale invariant in the sense that if you scale two boxes by some factor, they will still have the same IOU overlap. As an example if we scale by 2 in the x-direction and scale by 3 in the y direction: If A is (x1, y1, x2, y2) and B is (u1, v1, u2, v2), then IOU((A, B)) = IOU((2*x1, 3*y1, 2*x2, 3*y2), (2*u1, 3*v1, 2*u2, 3*v2))

What this means is that evaluating in normalized coordinates should give the same result as evaluating in absolute coordinates.