I have the following code (migration from SpaCy v2) where I would like to calculate the precision, recall and f1-score for a given model:
nlp = spacy.load("my_model")
scorer = Scorer(nlp)
examples = []
for text, annotations in TEST_DATA:
examples.append(Example.from_dict(nlp.make_doc(text), annotations))
results = scorer.score(examples)
print(
"Precision {:0.4f}\tRecall {:0.4f}\tF-score {:0.4f}".format(results['ents_p'], results['ents_r'], results['ents_f'])
)
The weird thing I try to understand is why it always returns
Precision 0.0000 Recall 0.0000 F-score 0.0000
My TEST_DATA set is in the same form as the TRAIN_DATA set I was using to train the same model. Here is how it looks like:
[
(
'Line 106 – for dilution times, the units should be specified', {'entities': [(51, 60, 'ACTION'), (41, 47, 'MODAL'), (11, 40, 'CONTENT'), (0, 8, 'LOCATION')]}
),
(
'It should be indicated what test was applied to verify the normality of distribution.', {'entities': [(13, 22, 'ACTION'), (28, 85, 'CONTENT'), (3, 9, 'MODAL')]}
)
]
The scorer does not run the pipeline on the predicted docs, so you're evaluating blank docs against your test cases.
The recommended way is to use nlp.evaluate
instead:
scores = nlp.evaluate(examples)
If you want the call the scorer directly for some reason, the other alternative is to run the pipeline on the predicted docs (nlp
instead of nlp.make_doc
), so:
example = Example.from_dict(nlp(text), annots)