Search code examples
pythonnlpspacy

How to install a language model


I am exploring using NLP for some machine learning projects. I normally code all of my projects using python through Anaconda using either Jupyter notebooks or PyCharm as my IDE.

I would like to start using spacy and am planning on attending a workshop on it in the near future. Two recommendations were made that I do first. Install spacy and install the en_core_web_lg language model. I completed the first step, just by searching for the spacy package in Anaconda environments (the conventional way) and installed it. However, as far as installing the language model, I am less familiar with how to do this to get this on my computer since it is not a traditional package.

The spacy installation website cites here: https://spacy.io/models/en#en_core_web_lg that this language model can be installed by using:

INSTALLATION

$ python -m spacy download en_core_web_lg

I am assuming that this is a command through terminal? I am not very experienced using terminal but tried typing in the above command in one of the command lines and pressed enter and nothing happened. Is this the correct way to install this model? How should I install it? Also, for pedagogical purposes, what exactly is happening when we install the model? It exists on our computer and then can be utilized for NLP in say a Jupyter notebook if called.

Sorry if these questions seem fairly basic, I am still trying to learn these new techniques. Any help, references, or advice would be greatly appreciated.

Thanks.


Solution

  • Make sure to activate your environment using virtualenv or conda and install spaCy as @Aris mentioned.

    To install spaCy

    pip install -U spacy
    

    To install a specific model, run the following command with the model name (for example en_core_web_lg):

    python -m spacy download [model]
    

    To load a model, use spacy.load() with the model name, a shortcut link or a path to the model data directory.

    import spacy
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(u"This is a sentence.")
    

    You can also import a model directly via its full name and then call its load() method with no arguments. This should also work for older models in previous versions of spaCy.

    import spacy
    import en_core_web_lg
    
    nlp = en_core_web_lg.load()
    doc = nlp(u"This is a sentence.")