Search code examples

Workload Identity & Service Accounts for Composer 2 / GKE Autopilot Cluster PodOperator tasks

I'm trying to run GKEStartPodOperator/KubernetesPodOperator tasks in a Composer 2 environment, which makes use of a GKE cluster in autopilot mode. We have an existing Composer 1 environment with a GKE cluster not in autopilot mode. Our tasks that authenticate with Google Cloud Platform services (BigQuery, GCS, etc), fail with 401 unauthorized in the Composer 2 environment, but succeed in the Composer 1 environment.

In the log files, I can tell that the tasks in both environments get their credentials via requests to the metadata server. The key difference is tasks in Composer 1 request the service account assigned to the node the task runs in, but the tasks in Composer 2 request what seems to be a workload identity pool like [project-name]

The logs from Composer 1 are:

[2021-10-22 12:38:01,349] {} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,351] {} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,352] {} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,359] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,374] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET
[2021-10-22 12:38:01,392] {} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,393] {} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,393] {} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,395] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,398] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET
[2021-10-22 12:38:01,412] {} INFO - service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-22 12:38:01,414] {} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-22 12:38:01,415] {} INFO - DEBUG:google.auth.transport.requests:Making request: GET
[2021-10-22 12:38:01,437] {} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1):
[2021-10-22 12:38:01,452] {} INFO - DEBUG:urllib3.connectionpool: "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 226
[2021-10-22 12:38:01,454] {} INFO - DEBUG:google.auth.transport.requests:Making request: GET[project-id]
[2021-10-22 12:38:01,463] {} INFO - DEBUG:urllib3.connectionpool: "GET /computeMetadata/v1/instance/service-accounts/[project-id] HTTP/1.1" 200 1049
[2021-10-22 12:38:01,468] {} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1):
[2021-10-22 12:38:02,028] {} INFO - DEBUG:urllib3.connectionpool: "POST /bigquery/v2/projects/[project-nam]/jobs?prettyPrint=false HTTP/1.1" 200 None

The logs from Composer 2 are:

[2021-10-21 13:56:06,619] {} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,620] {} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,620] {} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,621] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,624] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET
[2021-10-21 13:56:06,634] {} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,635] {} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,635] {} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,635] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,635] {} INFO - DEBUG:google.auth.transport._http_client:Making request: GET
[2021-10-21 13:56:06,641] {} INFO - service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-21 13:56:06,642] {} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-21 13:56:06,642] {} INFO - DEBUG:google.auth.transport.requests:Making request: GET
[2021-10-21 13:56:06,714] {} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1):
[2021-10-21 13:56:06,720] {} INFO - DEBUG:urllib3.connectionpool: "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 121
[2021-10-21 13:56:06,721] {} INFO - DEBUG:google.auth.transport.requests:Making request: GET[project-name]
[2021-10-21 13:56:06,831] {} INFO - DEBUG:urllib3.connectionpool: "GET /computeMetadata/v1/instance/service-accounts/[project-name] HTTP/1.1" 200 765
[2021-10-21 13:56:06,833] {} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1):
[2021-10-21 13:56:06,866] {} INFO - DEBUG:urllib3.connectionpool: "POST /bigquery/v2/projects/[project-name]/jobs?prettyPrint=false HTTP/1.1" 401 None

Based on Workload Identity documentation, I would guess I need to bind a specific service account to the node/node-pool running the pod, but I'm not sure how to do that with Composer 2 GKE Autopilot since nodes are managed for me. Composer 2 does not currently have documentation available on using KubernetesPodOperator or GKEStartPodOperator.

In summary, my question is: How should I configure my Composer 2 environment PodOperator tasks to utilize a specific service account to authenticate with GCP services?


  • I received some guidance from operations engineers, and now have a KubernetesPodOperator task successfully authenticating with GCP services via a service account. I'll share the steps and helpful bits of info below.

    First, follow the steps for Authenticating to Google Cloud using Workload Identity. I thought Composer 2 configured the kubernetes <> google cloud service account binding and annotation for me, but that wasn't the case. I had to create the namespace, kubernetes service account, binding for the ksa and gsa, and the annotation for the KSA just like the instructions say.

    Second, I had to update my KubernetesPodOperator instance with the parameters namespace and service_account_name set to the namespace and kubernetes service account I made in the first step.

    An upload of the DAG and a task execution later, and I can confirm that these two steps enabled my task to request the bound Google Service Account, and from there the google client library authentication succeeded in my test against BigQuery.