Search code examples
apache-sparkamazon-s3oozie

How to tell what AWS credentials Spark is using to read S3 files?


I am running an Oozie job that previously was running fine. And now I have a permission denied error when accessing S3 files. I am just trying to figure out which credentials it is using and where to fix them.

As far as I can tell credentials seems to come from several locations and not sure the order of precedence (e.g. ~/.aws/credentials, environment variables, hadoop configuration, IAM role, etc).

Is there a way to tell which is the active credentials being used? Is it possible to print the active AWS account key id in the spark logging?


Solution

    1. AWS login details don't really get logged for security reasons.
    2. Spark submit will pick up the AWS_ env vars from your desktop and set the fs.s3a values, overriding any in there.

    In the s3a connector, the order is

    1. secrets in the URI (bad, avoid, removed from recent releases)
    2. fs.s3a properties
    3. env vars
    4. IAM credentials supplied to an EC2 VM

    you can configure the list of authentication providers to change the order, remove them, etc.