Search code examples
amazon-iamamazon-eksenvoyproxyaws-app-mesh

AWS EKS: unable to attach IAM role to pods


So i created an AWS EKS cluster & proceeded with trying to created a service mesh using AWS App Mesh on AWS EKS using EKS workshop & AWS App Mesh user guide. The appmesh controller installs.

kubectl get pods confirms it.

NAMESPACE        NAME                                            READY   STATUS    RESTARTS   AGE
appmesh-system   appmesh-controller-847f957bc8-s2k7l             1/1     Running   0          57m

Then did the following -

  1. create a namespace & mesh (following user guide). Used following YAML config -
apiVersion: v1
kind: Namespace
metadata:
  name: example
  labels:
    mesh: v-mesh
    gateway: ingress-gw
    appmesh.k8s.aws/sidecarInjectorWebhook: enabled
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: Mesh
metadata:
  name: v-mesh
spec:
  namespaceSelector:
    matchLabels:
      mesh: v-mesh
  egressFilter:
    type: ALLOW_ALL
  1. create IAM service account. kubectl describe for the service account returns this.
Name:                example-svc-acct
Namespace:           example
Labels:              <none>
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxx:role/eksctl-eks-addon-iamserviceaccount-example-Role1
Image pull secrets:  <none>
Mountable secrets:   example-svc-acct-token-lgrs2
Tokens:              example-svc-acct-token-lgrs2
Events:              <none>

I can see the required annotation as per this 3. I deploy my service using helm. kubectl get pods -n example shows

NAME                      READY   STATUS    RESTARTS   AGE
svc1-5d4b4d6485-m7t7g      1/2     Running   0          7s
svc2-76cb5fd545-nqgx5      2/3     Running   0          7s
svc2-76cb5fd545-vsbnj      2/3     Running   0          7s
svc3-84f97bd64f-q9hjx      1/2     Running   0          7s

The envoy container is unable to move to ready state.

  1. Looking for environment variables in the container shows missing variables
kubectl exec -n example svc3-84f97bd64f-q9hjx -c envoy env | grep AWS
AWS_REGION=us-east-2

As per docs, these AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN should have been there.

  1. kubectl logs for envoy container shows permission problems
[2021-08-02 22:07:12.516][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamAggregatedResources gRPC config stream closed: 7, Unauthorized to perform appmesh:StreamAggregatedResources for arn:aws:appmesh:us-east-2:xxxxx:mesh/v-mesh/virtualNode/svc3-vn_example.
[2021-08-02 22:07:16.268][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamAggregatedResources gRPC config stream closed: 7, Unauthorized to perform appmesh:StreamAggregatedResources for arn:aws:appmesh:us-east-2:xxxxx:mesh/v-mesh/virtualNode/svc3-vn_example
[2021-08-02 22:07:21.402][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamAggregatedResources gRPC config stream closed: 7, Unauthorized to perform appmesh:StreamAggregatedResources for arn:aws:appmesh:us-east-2:xxxxx:mesh/v-mesh/virtualNode/svc3-vn_example.
[2021-08-02 22:07:42.125][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamAggregatedResources gRPC config stream closed: 7, Unauthorized to perform appmesh:StreamAggregatedResources for arn:aws:appmesh:us-east-2:xxxxx:mesh/v-mesh/virtualNode/svc3-vn_example.

The role attached to service account has action appmesh:StreamAggregatedResources permitted on all resources.

I can see the problem in step 3. Having looked in different places for an entire day I cannot figure out what I am missing to get the required role attached to the container, and thus set the needed environment variables.

Any pointer will be appreciated. Thanks.

More info:

$ eksctl version
0.42.0
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.11-eks-cfdc40", GitCommit:"cfdc40d4c1b7d14eb60152107963ae41aa2e4804", GitTreeState:"clean", BuildDate:"2020-09-17T17:10:39Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.8-eks-96780e", GitCommit:"96780e1b30acbf0a52c38b6030d7853e575bcdf3", GitTreeState:"clean", BuildDate:"2021-03-10T21:32:29Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

Solution

  • Apparently, it was a stupid mistake of missing out serviceAccountName in the deployment template spec.

    spec:
      serviceAccountName: {{ .Values.serviceAccount.name }}
    

    Added that & the problem went away.