Search code examples

Understanding Spacy's Scorer Output

I'm evaluating a custom NER model that I built using Spacy. I'm evaluating the training sets using Spacy's Scorer class.

    def Eval(examples):
    # test the saved model
    print("Loading from", './model6/')
    ner_model = spacy.load('./model6/')

    scorer = Scorer()
        for input_, annot in examples:
            doc_gold_text = ner_model.make_doc(input_)
            gold = GoldParse(doc_gold_text, entities=annot['entities'])
            pred_value = ner_model(input_)
            scorer.score(pred_value, gold)
    except Exception as e: print(e)


It works fine but I don't understand the output. Here's what I get for each training set.

{'uas': 0.0, 'las': 0.0, 'ents_p': 90.14084507042254, 'ents_r': 92.7536231884058, 'ents_f': 91.42857142857143, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 91.12227805695142, 'ents_r': 93.47079037800687, 'ents_f': 92.28159457167091, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 92.45614035087719, 'ents_r': 92.9453262786596, 'ents_f': 92.70008795074759, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 94.5993031358885, 'ents_r': 94.93006993006993, 'ents_f': 94.76439790575917, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 92.07920792079209, 'ents_r': 93.15525876460768, 'ents_f': 92.61410788381743, 'tags_acc': 0.0, 'token_acc': 100.0}

Does anyone know what the keys are? I've looked over Spacy's documentation and could not find anything.



    • UAS (Unlabelled Attachment Score) and LAS (Labelled Attachment Score) are standard metrics to evaluate dependency parsing. UAS is the proportion of tokens whose head has been correctly assigned, LAS is the proportion of tokens whose head has been correctly assigned with the right dependency label (subject, object, etc).
    • ents_p, ents_r, ents_f are the precision, recall and fscore for the NER task.
    • tags_acc is the POS tagging accuracy.
    • token_acc seems to be the precision for token segmentation.