Search code examples
airflowgoogle-kubernetes-engineservice-accountsgoogle-cloud-composerworkload-identity

Workload Identity & Service Accounts for Composer 2 / GKE Autopilot Cluster PodOperator tasks


I'm trying to run GKEStartPodOperator/KubernetesPodOperator tasks in a Composer 2 environment, which makes use of a GKE cluster in autopilot mode. We have an existing Composer 1 environment with a GKE cluster not in autopilot mode. Our tasks that authenticate with Google Cloud Platform services (BigQuery, GCS, etc), fail with 401 unauthorized in the Composer 2 environment, but succeed in the Composer 1 environment.

In the log files, I can tell that the tasks in both environments get their credentials via requests to the metadata server. The key difference is tasks in Composer 1 request the service account assigned to the node the task runs in, but the tasks in Composer 2 request what seems to be a workload identity pool like [project-name].svc.id.goog.

The logs from Composer 1 are:

[2021-10-22 12:38:01,349] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,351] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,352] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,359] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,374] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,392] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,395] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,398] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,412] {pod_launcher.py:148} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-22 12:38:01,414] {pod_launcher.py:148} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-22 12:38:01,415] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-22 12:38:01,437] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-22 12:38:01,452] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 226
[2021-10-22 12:38:01,454] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-id][email protected]/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-22 12:38:01,463] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-id][email protected]/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 1049
[2021-10-22 12:38:01,468] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-22 12:38:02,028] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-nam]/jobs?prettyPrint=false HTTP/1.1" 200 None

The logs from Composer 2 are:

[2021-10-21 13:56:06,619] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,621] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,624] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,634] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,641] {pod_launcher.py:149} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-21 13:56:06,714] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-21 13:56:06,720] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 121
[2021-10-21 13:56:06,721] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-21 13:56:06,831] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 765
[2021-10-21 13:56:06,833] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-21 13:56:06,866] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-name]/jobs?prettyPrint=false HTTP/1.1" 401 None

Based on Workload Identity documentation, I would guess I need to bind a specific service account to the node/node-pool running the pod, but I'm not sure how to do that with Composer 2 GKE Autopilot since nodes are managed for me. Composer 2 does not currently have documentation available on using KubernetesPodOperator or GKEStartPodOperator.

In summary, my question is: How should I configure my Composer 2 environment PodOperator tasks to utilize a specific service account to authenticate with GCP services?


Solution

  • I received some guidance from operations engineers, and now have a KubernetesPodOperator task successfully authenticating with GCP services via a service account. I'll share the steps and helpful bits of info below.

    First, follow the steps for Authenticating to Google Cloud using Workload Identity. I thought Composer 2 configured the kubernetes <> google cloud service account binding and annotation for me, but that wasn't the case. I had to create the namespace, kubernetes service account, binding for the ksa and gsa, and the annotation for the KSA just like the instructions say.

    Second, I had to update my KubernetesPodOperator instance with the parameters namespace and service_account_name set to the namespace and kubernetes service account I made in the first step.

    An upload of the DAG and a task execution later, and I can confirm that these two steps enabled my task to request the bound Google Service Account, and from there the google client library authentication succeeded in my test against BigQuery.