Search code examples
information-retrievalprecision-recall

Confusion about precision-recall curve and average precision


I'm reading a lot about Precision-Recall curves in order to evaluate my image retrieval system. In particular I'm reading this article about feature extractors in VLFeat and the wikipedia page about precision-recall.

I understand that this curve is useful to evaluate our system performance w.r.t. the number of elements retrieved. So we repeatedly compute precision-recall retrieving the top element, then top 2, top 3 and so on...but my question is: when do we stop?

My intuition is: we stop when our list of retrieved elements has recall equal to 1, so we retrieve all the relevant elements (i.e. there are no false negatives, only true positives).

Same question is for average precision: how many elements should be present in the retrieved result for computing it? If my previous intuition is correct, then we just need to find out what is the smallest list s.t. recall is 1 and use it for compute it AP.

I wonder why all the libraries for computing p-r curve don't show how this is implemented?


Solution

  • An information retrieval system with recall 1 means a perfect system which doesn't seem possible in practice! Precision-Recall curves are good when you need to compare two or more information retrieval systems. Its not about stopping when recall or precision reaches some value. Precision-Recall curve shows pairs of recall and precision values at each point (consider top 3 or 5 documents). You can draw the curve upto any reasonable point.

    Curves close to the perfect Precision-Recall curve have a better performance level than the ones closes to the baseline. In other words, a curve above the other curve has a better performance level. Two Precision-Recall curves represent the performance levels of two IR systems: A and B. System A clearly outperforms system B according to the following figure.

    enter image description here

    Remember: Precision-Recall curve not only used for evaluating IR systems. It can be used to show how much good your classifier is! For example, you can compute precision, recall for a binary classification task and plot the Precision-Recall curve that can give you a good estimate of the performance of your classifier.

    For example:

    enter image description here enter image description here

    I would encourage you to see this tutorial from Coursera. I believe your idea will become more clear about Precision-Recall curve.