Search code examples
spacyspacy-3

How to load customized NER model from disk with SpaCy?


I have customized NER pipeline with following procedure

doc = nlp("I am going to Vallila. I am going to Sörnäinen.")
for ent in doc.ents:
    print(ent.text, ent.label_)

LABEL = 'DISTRICT'
TRAIN_DATA = [
    (
    'We need to deliver it to Vallila', {
        'entities': [(25, 32, 'DISTRICT')]
    }),
    (
    'We need to deliver it to somewhere', {
        'entities': []
    }),
]

ner = nlp.get_pipe("ner")
ner.add_label(LABEL)

nlp.disable_pipes("tagger")
nlp.disable_pipes("parser")
nlp.disable_pipes("attribute_ruler")
nlp.disable_pipes("lemmatizer")
nlp.disable_pipes("tok2vec")

optimizer = nlp.get_pipe("ner").create_optimizer()
import random
from spacy.training import Example

for i in range(25):
    random.shuffle(TRAIN_DATA)
    for text, annotation in TRAIN_DATA:
        example = Example.from_dict(nlp.make_doc(text), annotation)
        nlp.update([example], sgd=optimizer)

I tried to save that customized NER to disk and load it again with following code

ner.to_disk('/home/feru/ner')

import spacy
from spacy.pipeline import EntityRecognizer
nlp = spacy.load("en_core_web_lg", disable=['ner'])

ner = EntityRecognizer(nlp.vocab)
ner.from_disk('/home/feru/ner')
nlp.add_pipe(ner)

I got however following error:

---> 10 ner = EntityRecognizer(nlp.vocab) 11 ner.from_disk('/home/feru/ner') 12 nlp.add_pipe(ner)

~/.local/lib/python3.8/site-packages/spacy/pipeline/ner.pyx in spacy.pipeline.ner.EntityRecognizer.init()

TypeError: init() takes at least 2 positional arguments (1 given)

This method to save and load custom component from disk seems to be from some erly SpaCy version. What's the second argument EntityRecognizer needs?


Solution

  • The general process you are following of serializing a single component and reloading it is not the recommended way to do this in spaCy. You can do it - it has to be done internally, of course - but you generally want to save and load pipelines using high-level wrappers. In this case this means that you would save like this:

    nlp.to_disk("my_model") # NOT ner.to_disk
    

    And then load it with spacy.load("my_model").

    You can find more detail about this in the saving and loading docs. Since it seems you're just getting started with spaCy, you might want to go through the course too. It covers the new config-based training in v3, which is much easier than using your own custom training loop like in your code sample.

    If you want to mix and match components from different pipelines, you still will generally want to save entire pipelines, and you can then combine components from them using the "sourcing" feature.