what is difference between spacy.load('en_core_web_sm')
and spacy.load('en')
? This link explains different model sizes. But i am still not clear how spacy.load('en_core_web_sm')
and spacy.load('en')
differ
spacy.load('en')
runs fine for me. But the spacy.load('en_core_web_sm')
throws error
i have installed spacy
as below. when i go to jupyter notebook and run command nlp = spacy.load('en_core_web_sm')
I get the below error
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-b472bef03043> in <module>()
1 # Import spaCy and load the language library
2 import spacy
----> 3 nlp = spacy.load('en_core_web_sm')
4
5 # Create a Doc object
C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\__init__.py in load(name, **overrides)
13 if depr_path not in (True, False, None):
14 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 15 return util.load_model(name, **overrides)
16
17
C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\util.py in load_model(name, **overrides)
117 elif hasattr(name, 'exists'): # Path or Path-like to model data
118 return load_model_from_path(name, **overrides)
--> 119 raise IOError(Errors.E050.format(name=name))
120
121
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
how I installed Spacy ---
(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>conda install -c conda-forge spacy
Fetching package metadata .............
Solving package specifications: .
Package plan for installation in environment C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder:
The following NEW packages will be INSTALLED:
blas: 1.0-mkl
cymem: 1.31.2-py35h6538335_0 conda-forge
dill: 0.2.8.2-py35_0 conda-forge
msgpack-numpy: 0.4.4.2-py_0 conda-forge
murmurhash: 0.28.0-py35h6538335_1000 conda-forge
plac: 0.9.6-py_1 conda-forge
preshed: 1.0.0-py35h6538335_0 conda-forge
pyreadline: 2.1-py35_1000 conda-forge
regex: 2017.11.09-py35_0 conda-forge
spacy: 2.0.12-py35h830ac7b_0 conda-forge
termcolor: 1.1.0-py_2 conda-forge
thinc: 6.10.3-py35h830ac7b_2 conda-forge
tqdm: 4.29.1-py_0 conda-forge
ujson: 1.35-py35hfa6e2cd_1001 conda-forge
The following packages will be UPDATED:
msgpack-python: 0.4.8-py35_0 --> 0.5.6-py35he980bc4_3 conda-forge
The following packages will be DOWNGRADED:
freetype: 2.7-vc14_2 conda-forge --> 2.5.5-vc14_2
Proceed ([y]/n)? y
blas-1.0-mkl.t 100% |###############################| Time: 0:00:00 0.00 B/s
cymem-1.31.2-p 100% |###############################| Time: 0:00:00 1.65 MB/s
msgpack-python 100% |###############################| Time: 0:00:00 5.37 MB/s
murmurhash-0.2 100% |###############################| Time: 0:00:00 1.49 MB/s
plac-0.9.6-py_ 100% |###############################| Time: 0:00:00 0.00 B/s
pyreadline-2.1 100% |###############################| Time: 0:00:00 4.62 MB/s
regex-2017.11. 100% |###############################| Time: 0:00:00 3.31 MB/s
termcolor-1.1. 100% |###############################| Time: 0:00:00 187.81 kB/s
tqdm-4.29.1-py 100% |###############################| Time: 0:00:00 2.51 MB/s
ujson-1.35-py3 100% |###############################| Time: 0:00:00 1.66 MB/s
dill-0.2.8.2-p 100% |###############################| Time: 0:00:00 4.34 MB/s
msgpack-numpy- 100% |###############################| Time: 0:00:00 0.00 B/s
preshed-1.0.0- 100% |###############################| Time: 0:00:00 0.00 B/s
thinc-6.10.3-p 100% |###############################| Time: 0:00:00 5.49 MB/s
spacy-2.0.12-p 100% |###############################| Time: 0:00:10 7.42 MB/s
(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>python -V
Python 3.5.3 :: Anaconda custom (64-bit)
(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>python -m spacy download en
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
100% |################################| 37.4MB ...
Installing collected packages: en-core-web-sm
Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
Linking successful
C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\en_core_web_sm
-->
C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\data\en
You can now load the model via spacy.load('en')
(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>
The answer to your misunderstanding is a Unix concept, softlinks which we could say that in Windows are similar to shortcuts. Let's explain this.
When you spacy download en
, spaCy tries to find the best small model that matches your spaCy distribution. The small model that I am talking about defaults to en_core_web_sm
which can be found in different variations which correspond to the different spaCy versions (for example spacy
, spacy-nightly
have en_core_web_sm
of different sizes).
When spaCy finds the best model for you, it downloads it and then links the name en
to the package it downloaded, e.g. en_core_web_sm
. That basically means that whenever you refer to en
you will be referring to en_core_web_sm
. In other words, en
after linking is not a "real" package, is just a name for en_core_web_sm
.
However, it doesn't work the other way. You can't refer directly to en_core_web_sm
because your system doesn't know you have it installed. When you did spacy download en
you basically did a pip install. So pip knows that you have a package named en
installed for your python distribution, but knows nothing about the package en_core_web_sm
. This package is just replacing package en
when you import it, which means that package en
is just a softlink to en_core_web_sm
.
Of course, you can directly download en_core_web_sm
, using the command: python -m spacy download en_core_web_sm
, or you can even link the name en
to other models as well. For example, you could do python -m spacy download en_core_web_lg
and then python -m spacy link en_core_web_lg en
. That would make
en
a name for en_core_web_lg
, which is a large spaCy model for the English language.
Hope it is clear now :)