Search code examples
kubernetesterraformamazon-eksexternal-dns

Troubleshooting EKS External-Dns IAM


Problem

I am trying to troubleshoot the following message.
time="<timestamp>" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: <uuid>"
Which I get from running kubectl logs external-dns-xxxxxxxxxx-xxxxx

My Question

I am trying to figure out ...

  1. Where does this message get generated from? I can't understand if its from my service, serviceaccount, clusterrole, clusterrolebinding, the pod, or something else. Any clarification or links to useful explanations would be appreciated. (My guess right now is from the pod, based on the k8s documentation, but I'm still not positive, and I'm not sure how to try tracing it to confirm)
  2. Why is it that my IAM permissions that I've explicitly specified, are not being assumed by my external-dns? Any explanation on the flow of how my external-dns pod attempts to conduct its permission assumption, or carry out its tasks, would be GREATLY appreciated!

My Goal

I'm pretty new to K8s, and am trying to deploy a EKS cluster w/ external-dns to allow for automated management of my Route53 records.

What I've tried so far

  1. I've messed around with expanding the IAM permissions, and opened them up as wide as I could.
  2. I've explicitly added annotations to all of my resources defining the eks.amazonaws.com/role-arn
  3. I've tried moving the external-dns deployment from kube-system to default namespace, since that was recommend on a GitHub issue with the same error message.

Deployment Details

I'm using Terraform to deploy most of my EKS cluster, node group, OIDC, & Helm.
For now I've opted to just share the results of the deployment, rather than the configs, in order to try and minimize the size of this question. If you'd like to see the configs just ask and I'll share everything I have.

Kubectl Descriptions

kubectl describe service external-dns

Name:              external-dns
Namespace:         default
Labels:            app.kubernetes.io/instance=external-dns
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=external-dns
                   helm.sh/chart=external-dns-6.9.0
Annotations:       eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
                   meta.helm.sh/release-name: external-dns
                   meta.helm.sh/release-namespace: default
Selector:          app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                172.20.233.113
IPs:               172.20.233.113
Port:              http  7979/TCP
TargetPort:        http/TCP
Endpoints:         10.12.13.93:7979
Session Affinity:  None
Events:            <none>

kubectl describe serviceaccount external-dns

Name:                external-dns
Namespace:           default
Labels:              app.kubernetes.io/managed-by=Helm
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
                     meta.helm.sh/release-name: external-dns
                     meta.helm.sh/release-namespace: default
Image pull secrets:  <none>
Mountable secrets:   external-dns-token-twgpb
Tokens:              external-dns-token-twgpb
Events:              <none>

kubectl describe clusterrole external-dns

Name:         external-dns
Labels:       <none>
Annotations:  eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
PolicyRule:
  Resources                     Non-Resource URLs  Resource Names  Verbs
  ---------                     -----------------  --------------  -----
  endpoints                     []                 []              [get watch list]
  nodes                         []                 []              [get watch list]
  pods                          []                 []              [get watch list]
  services                      []                 []              [get watch list]
  ingresses.extensions          []                 []              [get watch list]
  gateways.networking.istio.io  []                 []              [get watch list]
  ingresses.networking.k8s.io   []                 []              [get watch list]

kubectl describe clusterrolebindings.rbac.authorization.k8s.io external-dns

Name:         external-dns
Labels:       <none>
Annotations:  eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
Role:
  Kind:  ClusterRole
  Name:  external-dns
Subjects:
  Kind            Name          Namespace
  ----            ----          ---------
  ServiceAccount  external-dns  default

kubectl describe ingress -n kube-system

Name:             aws-lb-ctrlr
Labels:           <none>
Namespace:        kube-system
Address:          
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *           
              /*   aws-load-balancer-controller:80 (<error: endpoints "aws-load-balancer-controller" not found>)
Annotations:  alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
              alb.ingress.kubernetes.io/listen-ports: [{'HTTP': 80}]
              alb.ingress.kubernetes.io/scheme: internet-facing
              external-dns.alpha.kubernetes.io/hostname: <my-domain.tld>
              kubernetes.io/ingress.class: alb
Events:       <none>

kubectl describe pod

Name:             external-dns-xxxxxxxxxx-xxxxx
Namespace:        default
Priority:         0
Service Account:  external-dns
Node:             ip-10-12-13-107.ec2.internal/10.12.13.107
Start Time:       Tue, 20 Sep 2022 10:48:06 -0400
Labels:           app.kubernetes.io/instance=external-dns
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=external-dns
                  helm.sh/chart=external-dns-6.9.0
                  pod-template-hash=xxxxxxxxxx
Annotations:      kubernetes.io/psp: eks.privileged
Status:           Running
IP:               10.12.13.93
IPs:
  IP:           10.12.13.93
Controlled By:  ReplicaSet/external-dns-xxxxxxxxxx
Containers:
  external-dns:
    Container ID:  docker://5b49f49f7b9c0be8cb00835f117eedccaff3d5bb4ebfecb4bc6af771d2b3d336
    Image:         docker.io/bitnami/external-dns:0.12.2-debian-11-r14
    Image ID:      docker-pullable://bitnami/external-dns@sha256:195dec0f60c9137952ea0604623c7eb001ece4142916bdfb0cc79f5d9cdc4b62
    Port:          7979/TCP
    Host Port:     0/TCP
    Args:
      --metrics-address=:7979
      --log-level=debug
      --log-format=text
      --domain-filter=<my-domain.tld>
      --policy=sync
      --provider=aws
      --registry=txt
      --interval=1m
      --txt-owner-id=<hosted-zone-id>
      --source=service
      --source=ingress
      --aws-api-retries=3
      --aws-zone-type=public
      --aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
      --aws-batch-change-size=1000
    State:          Running
      Started:      Tue, 20 Sep 2022 10:48:13 -0400
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
    Readiness:      http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      AWS_DEFAULT_REGION:           us-east-1
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_ROLE_ARN:                 arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d82r7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  kube-api-access-d82r7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  3m44s  default-scheduler  Successfully assigned default/external-dns-xxxxxxxxxx-xxxxx to ip-10-12-13-107.ec2.internal
  Normal  Pulling    3m43s  kubelet            Pulling image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14"
  Normal  Pulled     3m40s  kubelet            Successfully pulled image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14" in 3.588418583s
  Normal  Created    3m38s  kubelet            Created container external-dns
  Normal  Started    3m37s  kubelet            Started container external-dns

kubectl describe deployments.apps

Name:                   external-dns
Namespace:              default
CreationTimestamp:      Tue, 20 Sep 2022 10:48:06 -0400
Labels:                 app.kubernetes.io/instance=external-dns
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=external-dns
                        helm.sh/chart=external-dns-6.9.0
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: external-dns
                        meta.helm.sh/release-namespace: default
Selector:               app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=external-dns
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=external-dns
                    helm.sh/chart=external-dns-6.9.0
  Service Account:  external-dns
  Containers:
   external-dns:
    Image:      docker.io/bitnami/external-dns:0.12.2-debian-11-r14
    Port:       7979/TCP
    Host Port:  0/TCP
    Args:
      --metrics-address=:7979
      --log-level=debug
      --log-format=text
      --domain-filter=<my-domain.tld>
      --policy=sync
      --provider=aws
      --registry=txt
      --interval=1m
      --txt-owner-id=<hosted-zone-id>
      --source=service
      --source=ingress
      --aws-api-retries=3
      --aws-zone-type=public
      --aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
      --aws-batch-change-size=1000
    Liveness:   http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
    Readiness:  http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      AWS_DEFAULT_REGION:  us-east-1
    Mounts:                <none>
  Volumes:                 <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   external-dns-xxxxxxxxxx (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  9m30s  deployment-controller  Scaled up replica set external-dns-xxxxxxxxxx to 1

AWS IAM (AllowExternalDNSUpdates)

IAM Role (Trust Relationship)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::<userid>:oidc-provider/oidc.eks.region-code.amazonaws.com/id/<oidc-id>"
            },
            "Action": "sts:AssumeRoleWithWebIdentity"
        }
    ]
}

IAM Policy (Permissions)

{
    "Statement": [
        {
            "Action": "route53:ChangeResourceRecordSets",
            "Effect": "Allow",
            "Resource": "arn:aws:route53:::hostedzone/*",
            "Sid": ""
        },
        {
            "Action": [
                "route53:ListResourceRecordSets",
                "route53:ListHostedZones"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": ""
        }
    ],
    "Version": "2012-10-17"
}

Solution

  • Answer

    So basically it was two things

    1. (Credit to @Jordanm, in the comments) The trust relationship was incorrect, I edited the post to fix it and re-ran my configs. My problem then turned into records retrieval failed: failed to list hosted zones: AccessDenied: User: arn:aws:sts::<userid>:assumed-role/AllowExternalDNSUpdates/1663776911448118272 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::<userid>:role/AllowExternalDNSUpdates\n\tstatus
    2. Because I had an additional error, I had to go back and fix my terraform helm config, and remove the "assume-role" setting. Basically, if you have the 2nd error, "assumed-role trying to assume-role" then you are just assuming the role twice.