Search code examples
spacynamed-entity-recognitionprecision-recallspacy-3

Why is my SpaCy v3 scorer returing 0 for precision, recall and f1?


I have the following code (migration from SpaCy v2) where I would like to calculate the precision, recall and f1-score for a given model:

nlp = spacy.load("my_model")
scorer = Scorer(nlp)
examples = []
for text, annotations in TEST_DATA:
    examples.append(Example.from_dict(nlp.make_doc(text), annotations))
results = scorer.score(examples)
print(
    "Precision {:0.4f}\tRecall {:0.4f}\tF-score {:0.4f}".format(results['ents_p'], results['ents_r'], results['ents_f'])
)

The weird thing I try to understand is why it always returns

Precision 0.0000    Recall 0.0000   F-score 0.0000

My TEST_DATA set is in the same form as the TRAIN_DATA set I was using to train the same model. Here is how it looks like:

[
    (
        'Line 106 – for dilution times, the units should be specified', {'entities': [(51, 60, 'ACTION'), (41, 47, 'MODAL'), (11, 40, 'CONTENT'), (0, 8, 'LOCATION')]}
    ),
    (
        'It should be indicated what test was applied  to verify the normality of distribution.', {'entities': [(13, 22, 'ACTION'), (28, 85, 'CONTENT'), (3, 9, 'MODAL')]}
    )
]

Solution

  • The scorer does not run the pipeline on the predicted docs, so you're evaluating blank docs against your test cases.

    The recommended way is to use nlp.evaluate instead:

    scores = nlp.evaluate(examples)
    

    If you want the call the scorer directly for some reason, the other alternative is to run the pipeline on the predicted docs (nlp instead of nlp.make_doc), so:

    example = Example.from_dict(nlp(text), annots)