Search code examples
information-retrieval

Precision And Recall in IR


A set of news articles that contains 30 mentions of locations. An extractor extracts 24 location entities, 6 of which are incorrect. What are the Precision and Recall values?

Correct me if i were wrong TP=18, FP=6, FN=6,THEN both recall and precision be .75


Solution

  • Following the Wikipedia definition:

    Precision = |relevant docs retrieved| / |retrieved documents| = 18 / 24 = 0.75

    Recall = |relevant docs retrieved| / |relevant docs| = 18 / ??

    To work out the recall we need to know how many correct locations are in the initial collection of 30 locations.

    Edit:

    Taking into account that the new problem statement is the following: "Consider a set of news articles that contains 30 mentions of locations. From this source, an extractor extracts 24 location entities, 6 of which are incorrect. What are the Precision and Recall values? with possible answers:

    1. P = 0.80, R = 0.50
    2. P = 0.75, R = 0.60
    3. P = 0.60, R = 0.80
    4. P = 0.75, R = 0.50"

    There is no possible answer, none of the four options are possible.

    The justification is the following: The precision can be computed easily as it has been done before and the value is 0.75

    For the recall we don't know the total number of correct locations in the initial collection, but at least, we know that there are 18 correct locations (from a total of 30 locations), because the extractor has reached this value. But in the collection could be more than 18 correct locations, it could be 19, 20, 21, 22 or 23. From 24 to 30 is not possible, because, at least we know that there are 6 incorrect locations (because the extractor has reached this number).

    Then, the answer is chosen by discarding the possibilities. As P=0.75, then we only have two possibilities: R=0.60 or R=0.50.

    If we test the possible values, we have that:

    R=18/18=1.0, R=18/19=0.94, R=18/20=0.9, R=18/21= 0.85,

    R=18/22=0.81, R=18/23=0.78.

    As R cannot be either 0.5 nor 0.6, then there is no possible answer to this question.

    Moreover, the proposed solution in your question, i.e., R = 0.75, it is not possible either.

    Hope it will be helpful!