What is Multiple Object Tracking Precision (MOTP)?

Multiple Object Tracking Precision (MOTP) is one of the metrics defined in the Clear MOT paper for evaluating multiple object tracking algorithms. In this paper, it is defined as the average distance between the predicted object location and the ground-truth object location, over all predictions that are successfully matched to a ground-truth. This distance could either be absolute (pixel) distance, or, more commonly I think in the case of objects being denoted by bounding boxes, 1-IoU, the intersection-over-union metric between the ground truth and the predicted bounding box. In either case, you want the distance to be small, so the MOTP metric should be as close to zero as possible.

This is where I am confused, because in some multiple object tracking benchmarks (see UA Detrac and MOT Challenge), MOTP is listed as a percentage and the goal is for MOTP to be as high as possible. The MOT challenge website even cites the CLEAR MOT metrics as their source for this metric, when the definitions are clearly dissimilar!

So, to put my question succinctly, why do these benchmarks use a percentage for MOTP instead of an absolute value, and why is the goal for it to be as high as possible? What does this metric actually represent?

Solution

MOT16: A Benchmark for Multi-Object Tracking (see 4.1.5 Multiple Object Tracking Precision) defines MOTP as a measure of bounding box overlap.

MOTP thereby gives the average overlap between all correctly matched hypotheses and their respective objects and ranges between 50% and 100%.

You can check the precise formula in the paper.