Search code examples
perlsequenceprecisiontagging

How to understand the script conlleval.perl for evaluating tagging?


I always don't know how to evaluate a task for tagging including POS tagging or any other sequence tagging. I especially don't know how to calculate the Precision, Recall and F1 score of those tasks. I then found there is a script named conlleval.perl and we can directly use it for evaluating. But I don't know perl language and I still confused how P, R, F1 calculated in tagging tasks. Is there anyone can tell me?


Solution

  • There is a simple definition in a book Spoken Language Understanding: Systems for Extracting Semantic Information from Speech (by Gokhan Tur, Renato De Mori), chapter 3.1.5 Evaluation metrics:

    Precision = # of reference slots correctly detected by SLU / # of total slots detected by SLU

    Recall = # of reference slots correctly detected by SLU / # of total reference slots

    F1 = 2 x Precision x Recall / (Precision + Recall)

    Note: for overall metrics conlleval uses micro averaging.