Search code examples
amazon-web-serviceshadoopamazon-s3sqoop

Specify the AWS credentials in hadoop


I want to specify the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID at run-time.

I already tried using

hadoop -Dfs.s3a.access.key=${AWS_ACESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY} fs -ls s3a://my_bucket/

and

export HADOOP_CLIENT_OPTS="-Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY}"

and

export HADOOP_OPTS="-Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY}"

In the last two examples, I tried to run with:

hadoop fs -ls s3a://my-bucket/

In all the cases I got:

-ls: Fatal internal error
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
        at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
        at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
        at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
        at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
        at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

What am doing wrong?


Solution

  • This is a correct way to pass the credentials at runtime,

    hadoop fs -Dfs.s3a.access.key=${AWS_ACCESS_KEY_ID} -Dfs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY} -ls s3a://my_bucket/
    

    Your syntax needs a small fix. Make sure that empty strings are not passed as the values to these properties. It would make these runtime properties invalid and would go on searching for the credentials as per the authentication chain.

    The S3A client follows the following authentication chain:

    1. If login details were provided in the filesystem URI, a warning is printed and then the username and password extracted for the AWS key and secret respectively.
    2. The fs.s3a.access.key and fs.s3a.secret.key are looked for in the Hadoop XML configuration.
    3. The AWS environment variables are then looked for.
    4. An attempt is made to query the Amazon EC2 Instance Metadata Service to retrieve credentials published to EC2 VMs.

    The other possible methods to pass the credentials at runtime (please note that it is neither safe nor recommended to supply them during runtime),

    1) Embed them in the S3 URI

    hdfs dfs -ls s3a://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@my-bucket/
    

    If the secret key contains any + or / symbols, escape them with %2B and %2F respectively.

    Never share the URL, logs generated using it, or use such an inline authentication mechanism in production.

    2) export environment variables for the session

    export AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY_ID>
    export AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_ACCESS_KEY>
    
    hdfs dfs -ls s3a://my-bucket/