Search code examples
pythonnlpanacondacondaspacy

Spacy: Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory


I'm trying to load the en_core_web_sm spaCy model, but I have been unsuccessful in doing so.

The error that occurs is the following:

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

I'm working in a Anaconda virtual environment. The following checkboxes are ticked:

  • Did conda activate gcp-env prior to installing spaCy and the english language model
  • Have run conda install -c conda-forge spacy while on the right environment
  • Then, have run python -m spacy download en, still while on the right environment
  • Also tried adding spacy to the requirements.txt , and installing dependencies via that route, after first attempts failed

spacy info produces this output:

spacy info

============================== Info about spaCy ==============================

spaCy version    3.3.0                         
Location         /Users/simonmortensen/opt/anaconda3/envs/gcp-env/lib/python3.10/site-packages/spacy
Platform         macOS-11.6.5-x86_64-i386-64bit
Python version   3.10.4                        
Pipelines        en_core_web_sm (3.3.0)

python -m spacy validate produces this output:

================= Installed pipeline packages (spaCy v3.3.0) =================
ℹ spaCy installation:
/Users/simonmortensen/opt/anaconda3/envs/gcp-env/lib/python3.10/site-packages/spacy

NAME             SPACY                 VERSION                            
en_core_web_sm   >=3.3.0.dev0,<3.4.0   3.3.0   ✔

I've been through several previous StackOverflow posts on the same topic. Those have often been solved, but my issue remains.

Any advice would be very much appreciated. Thanks in advance!

Simon

EDIT: For additional context, pip list on the environment contains both

spacy                         3.3.0
spacy-legacy                  3.0.9
spacy-loggers                 1.0.2

and

en-core-web-sm                3.3.0

Even so, import en_core_web_sm also doesn't work:

import en_core_web_sm
Traceback (most recent call last):

  Input In [65] in <cell line: 1>
    import en_core_web_sm

ModuleNotFoundError: No module named 'en_core_web_sm'

Solution

  • Spyder was the villain.

    All packages were correctly installed on the virtual environment, but Spyder was not running that environment (even if the IDE was launched with the spyder command from a terminal where the environment was in fact activated).

    In order to make Spyder run the correct environment, you needed to change the Python interpreter in the Spyder preferences: enter image description here ... and then restart the kernel.

    I got an error prompting me to pip install spyder-kernels==2.1.*, but once that was done (make sure to do it on the right venv), I restarted Spyder, and it finally worked!

    See discussions in thread: https://github.com/explosion/spaCy/discussions/10895.