Search code examples
amazon-web-servicesamazon-ec2amazon-emr

Saving Files on EMR EC2 Instances


I'm having diskspace issues when downloading files from S3 to my EMR nodes. I'm using c3.4xlarge nodes, which are supposed to have 160GB of space, yet when using addFile with in PySpark to send the files (8 450MB files), I get No space left on device errors.

Any idea why this is happening?

I notice a similar issue when downloading the files via the AWS CLI on the master node.

What's going on?


Solution

  • Are you sure you are placing the files on the correct partition that has all that space? I believe you need to copy them to the /mnt directory. Running df -H on one of the servers will show you where your space is.