Search code examples
amazon-web-servicesamazon-s3elastic-map-reduce

Elastic Map Reduce and amazon s3: Error regarding access keys


I am new to Amazon EMR and Hadoop in general. I am currently trying to set up a Pig job on an EMR cluster and to import and export data from S3. I have set up a bucket in s3 with my data named "datastackexchange". In an attempt to begin to copy the data to Pig, I have used the following command:

ls s3://datastackexchange

And I am met with the following error message:

AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

I presume I am missing some critical steps (presumably involving setting up the access keys). As I am very new to EMR, could someone please explain what I need to do to get rid of this error and allow me to use my S3 data in EMR?

Any help is greatly appreciated - thank you.


Solution

  • As you correctly observed, your EMR instances do not have the privileges to access the S3 data. There are many ways to specify the AWS credentials to access your S3 data, but the correct way is to create IAM role(s) for accessing your S3 data.

    Configure IAM Roles for Amazon EMR explains the steps involved.