Search code examples
hadoopamazon-web-servicesamazon-s3hdfsdistcp

AWS instance distcp to s3 - Access keys


If I have an EC2 instance created with a role, what is the best practice way to get access keys to do a distcp from hdfs to s3?

I don't want to be sending access keys to the instance using our automated deployment tools because that would mean storing the keys in plain sight. Is there a way for the instance to request a set of keys using the CLI?

I need to have them for hadoop distcp /data s3n://<access_key>:<secret_key>@mybucket/baackup/data


Solution

  • When an Amazon EC2 instance is launched with a Role, then the instance is given access to temporary security keys via the Instance Metadata Service. It works as follows:

    • A Role is created in Identity and Access Management (IAM)
    • Appropriate permissions are granted to the role
    • An Amazon EC2 instance is launched with that Role selected. If you are using Amazon Elastic MapReduce (EMR) to launch the instances, then they are typically assigned an EMR-specific role (which you can modify
    • A set of temporary security credentials are made available to the instance via the URL http://169.254.169.254/latest/meta-data/iam/security-credentials/

    Software that calls the AWS SDK knows to automatically look at this URL to retrieve security credentials. If the software you are using does not automatically look at this URL, you can extract them and pass them to the software.

    See: Retrieving Security Credentials from Instance Metadata