Search code examples
amazon-web-serviceskubernetesamazon-eks

AWS SDK missing credentials when running in EKS with role-annotated ServiceAccount


We use ServiceAccounts with a role annotation so the pods will acquire the role and use it for authenticating the AWS SDK's. This was working but we set up a new cluster and something is off in our config...

The error we see when trying to use the SDK (specifically, v2 SQS client) is:

Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1

It looks like the correct environment variables are in place when I check describe pod:

    Environment:
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_DEFAULT_REGION:           us-east-1
      AWS_REGION:                   us-east-1
      AWS_ROLE_ARN:                 arn:aws:iam::**************:role/dev-node-api
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-snv7w (ro)

I have my ServiceAccount annotated like this:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: node-api-service-account
  namespace: app
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::************:role/dev-node-api

and I attach the service account to the deployment like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: node-api-deploy
  namespace: app
spec:
  template:
    spec:
      serviceAccountName: node-api-service-account
...

I also set the trust relationship on the role to allow federation from the cluster's OIDC provider:

$ aws eks describe-cluster --name my-cluster --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5

> **************************************1F61

trust relationship in IAM role:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::************:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/********************************1F61"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.us-east-1.amazonaws.com/id/********************************1F61:aud": "sts.amazonaws.com",
                    "oidc.eks.us-east-1.amazonaws.com/id/********************************1F61:sub": "system:serviceaccount:apis:node-api-service-account"
                }
            }
        }
    ]
}

So from what I can tell, it looks like the role is being properly acquired but fore some reason the Javascript SDK is not picking up the credentials from the AWS_WEB_IDENTITY_TOKEN_FILE. Are there any logs I can use to debug this?


Solution

  • The problem was an incorrect namespace in the Trust Relationship condition. We changed the namespaces of the cluster during this work, and had forgotten to update the namespace in the trust relations. Updating that and restarting deployments resulted in successful SDK usage.