My code is using Python's multiprocessing for parallel computation. As part of the computation Spacy is used. Is it safe to create a single spacy object with nlp = spacy.load("de_core_news_lg")
and access it by multiple processes for named entity recognition?
You can take advantange of multiprocessing with spaCy by passing the n_process
argument to nlp.pipe
. For example:
docs = ["This is the first doc", "this is the second doc"]
nlp = spacy.load("en_core_web_sm") # use your model here
docs_tokens = []
for doc in nlp.pipe(docs, n_process=2):
tokens = [t.text for t in doc]
docs_tokens.append(tokens)
There's more about this in the spaCy documentation, as well as this Speed FAQ.