I created a Doc
object from a custom list of tokens according to documentation like so:
import spacy
from spacy.tokens import Doc
nlp = spacy.load("my_ner_model")
doc = Doc(nlp.vocab, words=["Hello", ",", "world", "!"])
How do I write named entities tags to doc
with my NER model now?
I tried to do doc = nlp(doc)
, but that didn't work for me raising a TypeError
.
I can't just join my list of words into a plain text to do doc = nlp(text)
as usual because in this case spaCy
splits some words in my texts into two tokens which I can not accept.
You can get the NER component from your loaded model and call it directly on the constructed Doc
:
doc = nlp.get_pipe("ner")(doc)
You can inspect a list of all the available components in the pipeline with nlp.pipe_names
and call them individually this way. The tokenizer is always the first element of the pipeline when you call nlp()
and it isn't included in this list, which only has the components that both take and return a Doc
.