I'm trying to 'lemmatize' spanish text using the spanish core model es_core_news_sm
. However, I'm getting OSError.
The following code is an example of lemmatization using SpaCy on Google Colabs:
import spacy
spacy.prefer_gpu()
nlp = spacy.load('es_core_news_sm')
text = 'yo canto, tú cantas, ella canta, nosotros cantamos, cantáis, cantan…'
doc = nlp(text)
lemmas = [tok.lemma_.lower() for tok in doc]
Also I tried to import the core, but didn't work in this way, getting a similar traceback.
import es_core_news_sm
nlp = es_core_news_sm.load()
Traceback:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-93-fd65d69a4f87> in <module>()
2 spacy.prefer_gpu()
3
----> 4 nlp = spacy.load('es_core_web_sm')
5 text = 'yo canto, tú cantas, ella canta, nosotros cantamos, cantáis, cantan…'
6 doc = nlp(text)
1 frames
/usr/local/lib/python3.6/dist-packages/spacy/util.py in load_model(name, **overrides)
137 elif hasattr(name, "exists"): # Path or Path-like to model data
138 return load_model_from_path(name, **overrides)
--> 139 raise IOError(Errors.E050.format(name=name))
140
141
OSError: [E050] Can't find model 'es_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
You first need to download the data:
!spacy download es_core_news_sm
Then Restart the runtime, after which your code will run correctly:
import spacy
spacy.prefer_gpu()
nlp = spacy.load('es_core_news_sm')
text = 'yo canto, tú cantas, ella canta, nosotros cantamos, cantáis, cantan…'
doc = nlp(text)
lemmas = [tok.lemma_.lower() for tok in doc]
print(len(lemmas))
16