I am trying to build a quality test framework for my text annotator. I wrote my annotators using GATE
I do have gold-standard (human annotated) data for every input document.
Here is list of gate resource for quality assurance GATE Embedded API for the measures
So far, I am able to get performance matrix containing FP,TP,FN, Precision, Recall and Fscores
using methods in
AnnotationDiffer
Now, I want to dive deeper. I would like to look at individual FP,FN on per document basis. i.e. I want to analyize each FP and FN so that I can fix my annotator accordingly.
I didn't see any function in any of GATE's classes such as AnnotationDiffer which returns List<Annotation>
of FP or FN. They just return count of FP and FN
int fp=annotationDiffer.getFalsePositivesStrict()
int fn=annotationDiffer.getMissing()
Before I go ahead and create my own utility to get List<Annotation>
of FP and FN and couple of surrounding sentences, to create an HTML report per input document for analysis. I wanted to check if there is something like that already exists.
I figured it out how to get FP and FN annotations
List<AnnotationDiffer.Pairing> differ= annotationDiffer.calculateDiff(goldAnnotSet, systemAnnotSet);
for(Annotation fnAnnotation:annotationDiffer.missingAnnotations)
{
System.out.println("FN=>"+fnAnnotation);
}
for(Annotation fpAnnotation:annotationDiffer.spuriousAnnotations)
{
System.out.println("FP=>"+fpAnnotation);
}
Based on offsets of fnAnnotation
or fpAnnotations
, I can easily get the surrounding sentences and create a nice html report.