precision information-retrieval precision-recall

Top k precision

I have a database of documents in which I perform searches. For each and every search there are n positives. Now, if I evaluate the performance of the search by precision@k and recall@k, things work out just fine for the latter:

recall@k = true positives / positives = true positives / n

The amount of true positives is in the range [0, n] so recall@k is in range [0, 1] - perfect. Things get weird concerning precision@k, however. If I calculate

precision@k = tp / (tp + fp) = tp / k

precision@k is in the range [0, n/k] which doesn't make too much sense to me. Think of the edge case n=1 for example. One cannot increase the tp beacuse there are just no more than n positives and one cannot decrease k either because, well, it's called precision@k, isn't it?

What am I getting wrong?

An example of what I'm talking about can be found in [1] figure 8b. What you can see there is a precision-recall-curve for the top 1..200 query results. Even though there are less than 200 positives in the database the precision is quite high.

[1] https://www.computer.org/csdl/pds/api/csdl/proceedings/download-article/19skfc3ZfKo/pdf

Solution

Since precision@k is computed as #num_relevant/k, its max could be 1 (which happens if all the k top-ranked documents in your retrieved list is relevant).

Your argument is correct in the sense that if the #relevant_docs is less than k then you're being wrongly penalized by the P@k metric because in that case even with a perfect retrieval you don't score 1 on the metric.

A standard solution is thus to take both into account and compute precision values not at arbitrary values of k but rather at recall points, i.e. at those positions in your ranked list where a relevant document is retrieved. You would then eventually divide the sum by the number of relevant documents. This measure is called the mean average precision* (MAP). An example to compute MAP follows.

Let's say that you retrieved 10 documents out of which 2 are relevant at ranks 2 and 5 (and there're 3 relevant docs in total - one of which is not retrieved).

You compute precision@k at the recall points (values of k = 2 and 5).

This gives:

1/2 (at position 2, one is relevant out of 2) +
2/5 (at position 5, one 2 are relevant out of 5)

and then you divide this number by 3 (total number of known rel docs). The last step favours systems that achieve high recall whereas the cut-off point based precisions favour systems that retrieve docs towards top ranks.

Note that a system A which retrieves the relevant docs at better ranks and retrieves a higher number of rel docs would score better than a system which fails to cater to either or both the cases.

Also note that you'll score a perfect 1 on this metric if you retrieve the 3 rel docs at the top 3 ranks out of 10 that you retrieved in total (check this), which addresses your concern that motivated this question.