Search code examples
pythonnlpspacypython-poetryvirtual-environment

How to download spaCy models in a Poetry managed environment


I am writing a Python Jupyter notebook that does some NLP processing on Italian texts.

I have installed spaCy 3.5.3 via Poetry and then attempt to run the following code:

import spacy
load_model = spacy.load('it_core_news_sm')

The import line works as expected, but running spacy.load produces the following error:

OSError: [E050] Can't find model 'it_core_news_sm'. It doesn't seem to be a Python package or a valid path to a data directory. The model name is correct as shown on https://spacy.io/models/it

After a web search, I see that a solution is to issue the following command:

python3 -m spacy download it_core_news_sm

After running this command the above code works as expected, however, is there a more 'kosher' way of doing this via Poetry?


Solution

  • You can add a URL dependency. First edit your pyproject.toml file to add the following (note: the name used here should match the name of the package (i.e. it_core_news_sm):

    [tool.poetry.dependencies]
    it_core_news_sm = {url = "https://github.com/explosion/spacy-models/releases/download/it_core_news_sm-3.5.0/it_core_news_sm-3.5.0.tar.gz"}
    

    Then run the corresponding add call:

    poetry add https://github.com/explosion/spacy-models/releases/download/it_core_news_sm-3.5.0/it_core_news_sm-3.5.0.tar.gz
    

    All of the spaCy models can be found on spaCy's model releases GitHub page.