I am currently building a spacy pipeline with custom NER,Entity Linker and Textcat components. For my Entity Linker component, I have modified the candidate_generator() to suit my use-case. I have used the ner_emersons demo project for reference. Following is my custom_functions code.
import spacy
from functools import partial
from pathlib import Path
from typing import Iterable, Callable
from spacy.training import Example
from spacy.tokens import DocBin
from spacy.kb import Candidate, KnowledgeBase, get_candidates
@spacy.registry.misc("Custom_Candidate_Gen.v1")
def create_candidates():
return custom_get_candidates
def custom_get_candidates(kb, span):
return kb.get_alias_candidates(span.text.lower())
@spacy.registry.readers("MyCorpus.v1")
def create_docbin_reader(file: Path) -> Callable[["Language"], Iterable[Example]]:
return partial(read_files, file)
def read_files(file: Path, nlp: "Language") -> Iterable[Example]:
# we run the full pipeline and not just nlp.make_doc to ensure we have entities and sentences
# which are needed during training of the entity linker
with nlp.select_pipes(disable="entity_linker"):
doc_bin = DocBin().from_disk(file)
docs = doc_bin.get_docs(nlp.vocab)
for doc in docs:
yield Example(nlp(doc.text), doc)
After training my entity linker and adding my textcat component to the pipeline, I am getting the following error:
catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Available names: spacy.CandidateGenerator.v1, spacy.EmptyKB.v1, spacy.KBFromFile.v1, spacy.LookupsDataLoader.v1, spacy.ngram_range_suggester.v1, spacy.ngram_suggester.v1
Why isn't my custom Candidate Generator getting registered?
Your options for having custom code loaded and registered when you load a model:
spacy package --code
and load the model from the installed package name (rather than the directory)setup.cfg
to register the methods (which works fine, but wouldn't be my first choice in this situation)See: