Search code examples
pythonnlpartificial-intelligencespacynamed-entity-recognition

Spacy NER not recognising NAME


Can anyone please help me understand why Spacy NER refuses to recognize the last NAME 'Hagrid' in the sentence, no matter the model used (sm, md, lg)?:

"Hermione bought a car, then both Hermione and Hagrid raced it on the track. Tom Brady was very happy with Hagrid this year."

import spacy
nlp = spacy.load('en_core_web_md')

test_data = "Hermione bought a car, then both Hermione and Hagrid raced it on the track. Tom Brady was very happy with Hagrid this year."

doc = nlp(test_data)
for ent in doc.ents:
        print(ent.text, ent.start_char, ent.end_char, ent.label_)

enter image description here


Solution

  • Well, Neural Network Models are basically a black box, so there is no way to know this for sure.

    I could imagine that the grammar in last sentence is a bit too "fancy"/literature-like if the model was trained on news or web data and might be throwing the model off. This difficulty of seeing the sentence context as something that would be followed up by a name as well as the fact that "Hagrid" is a kind of unusual name could be the reason.

    You can try some other models such as the one integrated in Flair:

    https://huggingface.co/flair/ner-english-large?text=Hermione+bought+a+car%2C+then+both+Hermione+and+Hagrid+raced+it+on+the+track.+Tom+Brady+was+very+happy+with+Hagrid+this+year.

    or this fine-tuned BERT model:

    https://huggingface.co/dslim/bert-large-NER?text=Hermione+bought+a+car%2C+then+both+Hermione+and+Hagrid+raced+it+on+the+track.+Tom+Brady+was+very+happy+with+Hagrid+this+year.

    They are more powerful and get it right, from my experience SpaCy is a nice tool and quite fast, but not the most precise for NER.