Search code examples
amazon-web-servicesamazon-ec2emramazon-emr

Downloading nltk in AWS EMR gives '[Errno 28] No space left on device'


I'm Getting 'No Space' error while running the below code in Amazon AWS EMR cluster.

import nltk
nltk.download('all')

I checked the memory in commandline, Below is the screenshot of the available memory available memory in the cluster. Please usher me.


Solution

  • There NLTK downloader downloads the data to the directory /usr/share/nltk_data/ on Unix/Linux based Operating Systems.

    Download this data on a different location where there is sufficient diskspace and write access.

    python -m nltk.downloader -d /mnt/nltk_data all
    

    Since the default data location is changed now, set the NLTK_DATA environment variable accordingly.

    export NLTK_DATA=/mnt/nltk_data
    

    Your instance seems to have ran out of diskspace under root. The / directory is filled to 100%, Free some diskspace before proceeding.