using scispacy and spacy and scispacy on slurm, NOT COLLAB.
smaller model (en_core_sci_sm
) works fine, large model throws an error.
python version - 3.9.2
pip list inscludes -
en-core-sci-sm 0.5.1
scipy 1.11.2
scispacy 0.5.2
code as in scispacy example
import spacy
if __name__ == '__main__':
nlp = spacy.load("en_core_sci_lg")
doc = nlp("Alterations in the hypocretin receptor 2 and preprohypocretin genes produce
narcolepsy in some animals.")
Error:
File "/dir/../file.py", line 58, in main
nlp = spacy.load("en_core_sci_lg")
File "/dir/../venv/lib/python3.9/site-packages/spacy/__init__.py", line 54, in load
return util.load_model(
File "/dir/../venv/lib/python3.9/site-packages/spacy/util.py", line 439, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_sci_lg'. It doesn't seem to be a Python package or a valid path to a data directory.
Tried so far - update pip, update spacy, update scispacy. in the venv -
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz
Error -
Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz
Downloading https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz (532.3 MB)
|████████████████████▋ | 343.7 MB 2.4 MB/s eta 0:01:18sda3: write failed, user block limit reached.
ERROR: Could not install packages due to an EnvironmentError: [Errno 122] Disk quota exceeded
pip install --cache-dir=/..file_dir/.cache/ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz
same error (Disk quota exceeded
) and result.When running df -h
I have more than free 2T so not sure why I don't have the space.
In case this would be useful to someone in the future -
I ended up doing 3 things that helped me -
HF_HOME=$PWD/.hf_cach export HF_HOME
Reboot the computer
Inside my code downloaded the file again while running
import spacy.cli spacy.cli.download("en_core_web_lg") nlp = spacy.load("en_core_web_lg")