What is mAP in object detection?

I have been reading through this blog in order to find what mAP is .In the sub heading of AP, they give the example of 5 apple images and finding out the average precision.As far I understand false positive is when the object is localised and classified but IOU < 0.5 (in the blog) and false negative is when the model fails to identify an object itself.So what about objects which are misclassified dont they belong to false postives?.

Also what does the table in the blog really respresent.The 'correct?' is for one particluar example or 5 examples together.Could you just brief me what is going on in your own terms or just what the blog says?

Solution

What is mAP in object detection?

mAP is just mean average precision which is the mean of APs from all the object classes. For example, if you had 5 object classes each of them would have an average precision (AP) and mAP will be the sum of those APs divided by 5.

false positive is when the object is localized and classified but IOU < 0.5

In object detection, we can have multiple classes of objects. The background is also a class but it is implicit. So for example, if we had 3 classes of objects (e.g. apple, orange, banana) the network considers it as 4 classes (apple, orange, banana, background). Only in the results, the program doesn't draw a bounding box around background objects.

False Positive means that the object detection model has reported a part of the image as an object of a specific class (e.g. apple). However, there is no apple in that part of the image. There is either another fruit like an orange (misclassification) or no fruit at all (background). Both cases are the same in the eye of the network and we consider this as false positive. So the network is considering that part as a positive sample for a specific class by mistake. The IoU can have any value in this case (it does not matter). The misclassified objects are also included in the false positive rate because they are reported as positive (for a specific class) but in fact, they are negative (they belong to another class or background).

False Negative means the model has predicted a part of the image as background when it is actually an object. In other words, the network has failed to detect an object and has reported it as background by mistake.

what does the table in the blog really represent?

The IoU (Intersection over Union) referred to in the blog which is used to report correct is calculated by dividing the area of the intersection between the detected box and the ground truth (the box drawn by a human as the correct box) by the union of those areas.

So if IoU is more than 0.5, it means that the network has predicted the apple position correctly. In the table, correct is for each apple and the precision is calculated from the number of correct predictions divided by all predictions.