Search code examples
python-3.xgoogle-cloud-platformjupyter-notebookspacy

how should I install the English model of spacy on my jupyter notebook which runs on a google cloud instance?


I am trying to use the English model of spaCy in jyputer notebook (python 3) which runs on a google cloud instance. I have installed spaCy, but my problem is that I cannot install/import its English model.

I have already tried the following codes:

!pip3 install en_core_web_sm
!python -m spacy download en

and many other codes, but none have worked and every time I got a different error. I use spacy hassle-free when I am working on my local machine, but I do not know how to install the English model on a jyputer notebook which runs on cloud. Any suggestions? Thanks!

FYI: when trying: !pip3 install en_core_web_sm, I get the following error:

Collecting spacy-model-en_core_web_sm
Exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
    wb.build(autobuilding=True)
  File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
    require_hashes
  File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
    resp.raise_for_status()
  File "/usr/share/python-wheels/requests-2.12.4-py2.py3-none-any.whl/requests/models.py", line 893, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/spacy-model-en-core-web-sm/

Solution

  • I found the answer thanks to @Dustin Ingram. I should type in:

    !python3 -m spacy download en_core_web_sm
    

    If you use python 2, drop "3" from the end of python in the command above!

    You can run this within Python as in a Jupyter Notebook via:

    import spacy
    spacy.cli.download('en_core_web_sm')