Search code examples
amazon-web-serviceshadoopamazon-s3emramazon-iam

Trouble integrating EMR with S3


I am having trouble integrating EMR with S3 i.e to implement EMRFS

EMR Version: emr-5.4.0

When I run hdfs dfs -ls s3://pathto/bucket/ I get following error

ls: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: XXXX), S3 Extended Request ID: XXXXX**

Please guide what is that, what I am missing ?

I have done following steps

  1. Created a KMS Key for EMR
  2. Added EMR_EC2_DefaultRole as key users in newly creates KMS Key
  3. Created a S3 Server Side Encryption Security Config policy for EMR
  4. Created new Inline policy for role/EMR_EC2_DefaultRole and EMR_DefaultRole for S3 bucket access
  5. Created a EMR cluster manually with new EMR Security policy and following configuration classification

    "fs.s3.enableServerSideEncryption": "true",
    "fs.s3.serverSideEncryption.kms.keyId":"KEYID"
    

Solution

  • EMR, by default, will use instance profile credentials(EMR_EC2_DefaultRole) to access your S3 bucket. The error means this role does not have necessary permissions to access S3 bucket.

    You will need to verify the IAM Role policy of that role to allow necessary S3 actions on both bucket and objects (Like s3:list*). Also check if you have any explicit Deny's etc. http://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html

    The access could also be denied because of a Bucket policy on set on the S3 bucket you are trying to access. http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html https://aws.amazon.com/blogs/security/iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/

    Your EMR cluster could be using an VPC endpoint for S3 to access S3 rather than Internet/NAT. In that case, you'll also need to verify VPC endpoint policies as well. https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html#vpc-endpoints-policies-s3