I'm following the Getting started with Endpoints for GKE with ESPv2. I'm using Workload Identity Federation and Autopilot on the GKE cluster.
I've been running into the error:
F0110 03:46:24.304229 8 server.go:54] fail to initialize config manager: http call to GET https://servicemanagement.googleapis.com/v1/services/name:bookstore.endpoints.<project>.cloud.goog/rollouts?filter=status=SUCCESS returns not 200 OK: 403 Forbidden
Which ultimately leads to a transport failure error and shut down of the Pod.
My first step was to investigate permission issues, but I could really use some outside perspective on this as I've been going around in circles on this.
Here's my config:
>> gcloud container clusters describe $GKE_CLUSTER_NAME \
--zone=$GKE_CLUSTER_ZONE \
--format='value[delimiter="\n"](nodePools[].config.oauthScopes)'
['https://www.googleapis.com/auth/devstorage.read_only',
'https://www.googleapis.com/auth/logging.write',
'https://www.googleapis.com/auth/monitoring',
'https://www.googleapis.com/auth/service.management.readonly',
'https://www.googleapis.com/auth/servicecontrol',
'https://www.googleapis.com/auth/trace.append']
>> gcloud container clusters describe $GKE_CLUSTER_NAME \
--zone=$GKE_CLUSTER_ZONE \
--format='value[delimiter="\n"](nodePools[].config.serviceAccount)'
default
default
Service-Account-Name: test-espv2
Roles
Cloud Trace Agent
Owner
Service Account Token Creator
Service Account User
Service Controller
Workload Identity User
I've associated the WIF svc-act with the Cluster with the following yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: test-espv2@<project>.iam.gserviceaccount.com
name: test-espv2
namespace: eventing
And then I've associated the pod with the test-espv2
svc-act
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: esp-grpc-bookstore
namespace: eventing
spec:
replicas: 1
selector:
matchLabels:
app: esp-grpc-bookstore
template:
metadata:
labels:
app: esp-grpc-bookstore
spec:
serviceAccountName: test-espv2
Since the gcr.io/endpoints-release/endpoints-runtime:2
is limited,
I created a test container and deployed it into the same eventing
namespace.
Within the container, I'm able to retrieve the endpoint service config with the following command:
curl --fail -o "service.json" -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://servicemanagement.googleapis.com/v1/services/${SERVICE}/configs/${CONFIG_ID}?view=FULL"
And also within the container, I'm running as the impersonated service account, tested with:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/
Are there any other tests I can run to help me debug this issue?
Thanks in advance,
I've finally figured out the issue. It was in 2 parts.
kubectl annotate serviceaccount
commands
nodeSelector: iam.gke.io/gke-metadata-server-enabled: "true"
due to AutopilotDoing this enabled a successful kube deployment as displayed by the logs.
Next error I had was
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
Manually provisioning them, as shown here :
solved my issues.