Search code examples

Hdfs to s3 Distcp - Access Keys

For copying the file from HDFS to S3 bucket I used the command

hadoop distcp -Dfs.s3a.access.key=ACCESS_KEY_HERE\
-Dfs.s3a.secret.key=SECRET_KEY_HERE /path/in/hdfs s3a:/BUCKET NAME

But the access key and sectet key are visible here which are not secure . Is there any method to provide credentials from file . I dont want to edit config file ,which is one of the method I came across .


  • Recent (2.8+) versions let you hide your credentials in a jceks file; there's some documentation on the Hadoop s3 page there. That way: no need to put any secrets on the command line at all; you just share them across the cluster and then, in the distcp command, set to the path, like jceks://

    Fan: if you are running in EC2, the IAM role credentials should be automatically picked up from the default chain of credential providers: after looking for the config options & env vars, it tries a GET of the EC2 http endpoint which serves up the session credentials. If that's not happening, make sure that com.amazonaws.auth.InstanceProfileCredentialsProvider is on the list of credential providers. It's a bit slower than the others (and can get throttled), so best to put near the end.