Search code examples
pythonnlptokenspacynamed-entity-recognition

spaCy: How to write named entities to an existing Doc object using some loaded model for this?


I created a Doc object from a custom list of tokens according to documentation like so:

import spacy
from spacy.tokens import Doc

nlp = spacy.load("my_ner_model")
doc = Doc(nlp.vocab, words=["Hello", ",", "world", "!"])

How do I write named entities tags to doc with my NER model now?

I tried to do doc = nlp(doc), but that didn't work for me raising a TypeError.

I can't just join my list of words into a plain text to do doc = nlp(text) as usual because in this case spaCy splits some words in my texts into two tokens which I can not accept.


Solution

  • You can get the NER component from your loaded model and call it directly on the constructed Doc:

    doc = nlp.get_pipe("ner")(doc)
    

    You can inspect a list of all the available components in the pipeline with nlp.pipe_names and call them individually this way. The tokenizer is always the first element of the pipeline when you call nlp() and it isn't included in this list, which only has the components that both take and return a Doc.