Search code examples
python-multiprocessingspacy

Sharing Spacy model between processes


My code is using Python's multiprocessing for parallel computation. As part of the computation Spacy is used. Is it safe to create a single spacy object with nlp = spacy.load("de_core_news_lg") and access it by multiple processes for named entity recognition?


Solution

  • You can take advantange of multiprocessing with spaCy by passing the n_process argument to nlp.pipe. For example:

    docs = ["This is the first doc", "this is the second doc"]
    
    nlp = spacy.load("en_core_web_sm")  # use your model here
    
    docs_tokens = []
    for doc in nlp.pipe(docs, n_process=2):
        tokens = [t.text for t in doc]
        docs_tokens.append(tokens)
    

    There's more about this in the spaCy documentation, as well as this Speed FAQ.