Search code examples
machine-learningnlpdata-sciencespacy

Where is en_core_web_sm of Python2 for spacy in Python3?


I need to repeat an experiment. The experiment was conducted in python 2.7 and spacy 1.8.2. The following snippet give different outputs:

for raw_doc in spam + ham:
    doc = self.nlp(raw_doc)
    docs.append(' '.join(
        [token.lemma_ for token in doc if (token.is_alpha and not (token.is_oov or token.is_stop))]))

In the Python 3.7 environment, the token.is_oov is always True, which leads to empty selection. While spacy 1.8.2 gives reasonable result. The vocabulary is important here for identical repeat.

Now I would like to repeat the experiment in python 3.7 and spacy 2.3. What can I do now please?


I have to admit my question is not well proposed. I made mistakes in installation of the language package of spacy.The link (https://github.com/explosion/spacy-models) is a good reference.


Solution

  • Try to get the older version of the en_core_web_sm model 1.2.0 which was used with older spacy lib from https://github.com/explosion/spacy-models All the old spacy models and en_core..models are archived there.