search-engine information-retrieval precision-recall

Understanding Recall and Precision

I am currently learning Information retrieval and i am rather stuck with an example of recall and precision

A searcher uses a search engine to look for information. There are 10 documents on the first screen of results and 10 on the second.

Assuming there is known to be 10 relevant documents in the search engines index.

Soo... there is 20 searches all together of which 10 are relevant.

Can anyone help me make sense of this?

Thanks

Solution

Recall and precision measure the quality of your result. To understand them let's first define the types of results. A document in your returned list can either be

classified correctly
- a true positive (TP): a document which is relevant (positive) that was indeed returned (true)
- a true negative (TN): a document which is not relevant (negative) that was indeed NOT returned (true)
misclassified
- a false positive (FP): a document which is not relevant but was returned positive
- a false negative (FN): a document which is relevant but was not returned negative

the precision is then:

|TP| / (|TP| + |FP|)

i.e. the fraction of retrieved documents which are indeed relevant

the recall is then:

|TP| / (|TP| + |FN|)

i.e. the fraction of relevant documents which are in your result set

So, in your example 10 out of 20 results are relevant. This gives you a precision of 0.5. If there are no more than these 10 relevant documents, you have got a recall of 1.

(When measuring the performance of an Information Retrieval system it only makes sense to consider both precision and recall. You can easily get a precision of 100% by returning no result at all (i.e. no spurious returned instance => no FP) or a recall of 100% by returning every instance (i.e. no relevant document was missed => no FN). )