Search code examples
pythonnlpspacy

Spacy:Trying to set conflicting doc.ents: A token can only be part of one entity, so make sure the entities you're setting don't overlap


I trying to use spacy to extract required custom entities from the text.

import spacy
from spacy_lookup import Entity
data = {0:["count"],1:["unique count","unique"]}

def processText(text):
    nlp = spacy.blank('en')
    for i,arr in data.items():
        fLabel = "test:"+str(i)
        fEntitty = Entity(keywords_list=list(set(arr)),label=fLabel)
        fEntitty.name = fLabel
        nlp.add_pipe(fEntitty)
    match_doc = nlp(text)
    print(match_doc.ents)
processText("unique count of city")

But the above code is throwing error like

ValueError: [E103] Trying to set conflicting doc.ents: '(1, 2, 'test:0')' and '(0, 2, 'test:1')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Not only this case, and also the same issue with the person name, something like Karthik vs Karthik reddy, Jon vs Jon Allen Could anyone please help me out to resolve this issue.

Thanks in advance!!


Solution

  • In spaCy, named entities can never be overlapping. If "Jon Allen" is a name, you shouldn't also annotate "John" as a name. So before training, you'll have to fix these overlapping/conflicting cases.

    EDIT after discussion in the comments: You'll want to implement an on_match function to filter out the matches to a non-overlapping set.