Search code examples
javahadoopamazon-s3elastic-map-reduce

How to read a file from s3 in EMR?


I would like to read a file from S3 in my EMR Hadoop job. I am using the Custom JAR option.

I have tried two solutions:

  • org.apache.hadoop.fs.S3FileSystem: throws a NullPointerException.
  • com.amazonaws.services.s3.AmazonS3Client: throws an exception, saying "Access denied".

What I fail to grasp is that I am starting the job from the Console, so obviously I should have the necessary permissions. However, the AWS_*_KEY keys are missing from the environment variables (System.getenv()) that are available to the mapper.

I am sure I do something wrong, just not sure what.


Solution

  • I think that your EMR cluster need to have access to S3, you can create an IAM role for your EMR cluster and give it access to S3. check this link : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html