Search code examples
pythonpython-3.xnlpspacynamed-entity-recognition

Custom Name Entity Regognition


I have the following sentence:

text="The weather is extremely severe in England"

I want to perform a custom Name Entity Recognition (NER) procedure

First a normal NER procedure will output England with a GPE label

pip install spacy

!python -m spacy download en_core_web_lg

import spacy
nlp = spacy.load('en_core_web_lg')

doc = nlp(text)

for ent in doc.ents:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Result: England - GPE - Countries, cities, states

However, I want the whole sentence to take the tag High-Severity.

So I am doing the following procedure:

from spacy.strings import StringStore

new_hash = StringStore([u'High_Severity']) # <-- match id
nlp.vocab.strings.add('High_Severity')

from spacy.tokens import Span

# Get the hash value of the ORG entity label
High_Severity = doc.vocab.strings[u'High_Severity']  

# Create a Span for the new entity
new_ent = Span(doc, 0, 7, label=High_Severity)

# Add the entity to the existing Doc object
doc.ents = list(doc.ents) + [new_ent]

I am taking the following error:

ValueError: [E1010] Unable to set entity information for token 6 which is included in more than one span in entities, blocked, missing or outside.

From my understanding, this is happening because NER has already recognised England as GRE and cannot add a label over the existing label.

I tried to execute the custom NER code (i.e, without first running the normal NER code) but this did not solve my problem.

Any ideas on how to Solve this problem?


Solution

  • Indeed it looks like NER do not allow overlapping, and that is your problem, your second part of the code tries to create a ner containing another ner, hence, it fails. see in:

    https://github.com/explosion/spaCy/discussions/10885

    and therefore spacy has spans categorization.

    I did not find yet the way to characterized a predefined span (not coming from a trained model)