Search code examples
pythonnlpspacynamed-entity-recognition

Confidence Score of Predicted NER entities using Spacy


I am trying to predict entities using a custom trained NER model using spacy. I read https://github.com/explosion/spaCy/pull/8855 that confidence scores of each entity can be obtained using spancat. But I have a little confusion regarding to make that work. According to my understanding, we have to train a pipeline using spancat component. So while training, within the config file there is a segment,

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000

Should we have to change this to

[nlp]
lang = "en"
pipeline = ["tok2vec","ner","spancat"]
batch_size = 1000

for the spancat to work.

Then after training, while predicting the entities from unknown text, should we have to use

doc = nlp(data_to_be_predicted)
spans = doc.spans["spancat"] # SpanGroup
print(spans.attrs["scores"]) # list of numbers, span length as SpanGroup

to get the confidence scores.

I am using spacy 3.1.3. I believe according to the documentation, this feature is rolled out by now.


Solution

  • I'm not really sure there's a question in your post, but yes, the spancat is available and you can get entity scores from it.

    The spancat is a different component from the ner component. So if you do this:

    pipeline = ["tok2vec","ner","spancat"]
    

    The spancat will not add scores for things your ner component predicted. You probably want to remove the ner component.


    About usage, please see the docs and the example project. This is how you get the score:

    doc = nlp(text)
    span_group = doc.spans["spans"] # default key, can be changed
    scores = span_group.attrs["scores"]
    
    # Note that `scores` is an array with one score for each span in the group
    for span, score in zip(span_group, scores):
        print(score, span)