I trying to use spacy to extract required custom entities from the text.
import spacy
from spacy_lookup import Entity
data = {0:["count"],1:["unique count","unique"]}
def processText(text):
nlp = spacy.blank('en')
for i,arr in data.items():
fLabel = "test:"+str(i)
fEntitty = Entity(keywords_list=list(set(arr)),label=fLabel)
fEntitty.name = fLabel
nlp.add_pipe(fEntitty)
match_doc = nlp(text)
print(match_doc.ents)
processText("unique count of city")
But the above code is throwing error like
ValueError: [E103] Trying to set conflicting doc.ents: '(1, 2, 'test:0')' and '(0, 2, 'test:1')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
Not only this case, and also the same issue with the person name, something like Karthik vs Karthik reddy, Jon vs Jon Allen Could anyone please help me out to resolve this issue.
Thanks in advance!!
In spaCy, named entities can never be overlapping. If "Jon Allen" is a name, you shouldn't also annotate "John" as a name. So before training, you'll have to fix these overlapping/conflicting cases.
EDIT after discussion in the comments:
You'll want to implement an on_match
function to filter out the matches to a non-overlapping set.