Search code examples
nlpspacynamed-entity-recognition

Unexpected type of NER data when trying to train spacy ner pipe to add new named entity


I'm trying to add a new named entity to spacy but I couldn't have good examples of Example objects for ner training and I'm getting a value error. Here is my code:

import spacy
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training import Example

nlp=spacy.load('en_core_web_lg')

ner=nlp.get_pipe("ner")
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]
ner.add_label('CRORG')
# Disable pipeline components that dont need to change
pipe_exceptions = ["ner"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

with nlp.disable_pipes(*unaffected_pipes):
    for iteration in range(30):
        random.shuffle(TRAIN_DATA)
        for raw_text,entity_offsets in TRAIN_DATA:
            doc=nlp.make_doc(raw_text)
            nlp.update([Example.from_dict(doc,entity_offsets)])

Here is the error message I'm getting


Solution

  • The 'entitites' in TRAIN_DATA are supposed to be a list of tuples. They have to be 2D, not just 1D.

    So instead of:

    TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
               ('we stand with ABC',{'entities':[24,26,'CRORG']}),
               ('we supports ABC',{'entities':[15,17,'CRORG']})]
    

    Use:

    TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[(0,2,'CRORG')]}),
               ('we stand with ABC',{'entities':[(24,26,'CRORG')]}),
               ('we supports ABC',{'entities':[(15,17,'CRORG')]})]