Airflow Dbt via Kubernetes Pod operator (Cloud Composer 2)

I am trying to run dbt jobs via Cloud Composer. The big idea is to use the kubernetes pod operator to retrieve run dbt run.

The dag is below. I am familiar with workload identity, yet for some reason i can't seem to run my dbt workload because of a Runtime Error: "unable to generate access token".

from datetime import datetime, timedelta

from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator

with DAG(
        # These args will get passed on to each operator
        # You can override them on a per-task basis during operator initialization
            'depends_on_past': False,
            'email': [''],
            'email_on_failure': False,
            'email_on_retry': False,
            'retries': 0,
            'retry_delay': timedelta(minutes=5)
        description='A simple tutorial DAG',
        start_date=datetime(2022, 1, 1),
) as dag:
    dbt_run = KubernetesPodOperator(
        namespace="k8-executor3",  # Some new namespace i created
        service_account_name="composer3", # Some new kubernetes service account I created
        config_file= "/home/airflow/composer_kube_config",
        cmds=["bash", "-cx"],
        arguments=["dbt run --project-dir dbt_k8_demo"],
        labels={"foo": "bar"},


I originally thought I was having issues with workload identity, but I followed the steps contained here to allow pods to authenticate to Google Cloud APIs using workload identity. When that didn't work, I changed the namespace to the dedicated "composer-user-workloads" namespace which apparently has access to Google Cloud Resources.

I have checked that my Composer environment has Workload Identity enabled.

This is the full error message:

*** Reading remote log from gs://europe-west9-ga4k8podv4-4edb9f24-bucket/logs/dag_id=airflow_k8_dbt_demo/run_id=manual__2023-12-11T18:08:33.155924+00:00/task_id=run_dbt_job_on_k8_demo/attempt=1.log.
[2023-12-11, 18:08:46 UTC] {} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: airflow_k8_dbt_demo.run_dbt_job_on_k8_demo manual__2023-12-11T18:08:33.155924+00:00 [queued]>
[2023-12-11, 18:08:46 UTC] {} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: airflow_k8_dbt_demo.run_dbt_job_on_k8_demo manual__2023-12-11T18:08:33.155924+00:00 [queued]>
[2023-12-11, 18:08:46 UTC] {} INFO - Starting attempt 1 of 1
[2023-12-11, 18:08:47 UTC] {} INFO - Executing <Task(KubernetesPodOperator): run_dbt_job_on_k8_demo> on 2023-12-11 18:08:33.155924+00:00
[2023-12-11, 18:08:47 UTC] {} INFO - Started process 172376 to run task
[2023-12-11, 18:08:47 UTC] {} INFO - Running: ['airflow', 'tasks', 'run', 'airflow_k8_dbt_demo', 'run_dbt_job_on_k8_demo', 'manual__2023-12-11T18:08:33.155924+00:00', '--job-id', '898', '--raw', '--subdir', 'DAGS_FOLDER/', '--cfg-path', '/tmp/tmpg5fagstz']
[2023-12-11, 18:08:47 UTC] {} INFO - Job 898: Subtask run_dbt_job_on_k8_demo
[2023-12-11, 18:08:47 UTC] {} INFO - Running <TaskInstance: airflow_k8_dbt_demo.run_dbt_job_on_k8_demo manual__2023-12-11T18:08:33.155924+00:00 [running]> on host airflow-worker-r8zh5
[2023-12-11, 18:08:47 UTC] {} INFO - Exporting env vars: AIRFLOW_CTX_DAG_EMAIL='' AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='airflow_k8_dbt_demo' AIRFLOW_CTX_TASK_ID='run_dbt_job_on_k8_demo' AIRFLOW_CTX_EXECUTION_DATE='2023-12-11T18:08:33.155924+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-12-11T18:08:33.155924+00:00'
[2023-12-11, 18:08:47 UTC] {} INFO - Building pod dbt-run-k8-mms8vkif with labels: {'dag_id': 'airflow_k8_dbt_demo', 'task_id': 'run_dbt_job_on_k8_demo', 'run_id': 'manual__2023-12-11T180833.1559240000-b8aec9c3b', 'kubernetes_pod_operator': 'True', 'try_number': '1'}
[2023-12-11, 18:08:48 UTC] {} INFO - Found matching pod dbt-run-k8-mms8vkif with labels {'airflow_kpo_in_cluster': 'False', 'airflow_version': '2.6.3-composer', 'dag_id': 'airflow_k8_dbt_demo', 'foo': 'bar', 'kubernetes_pod_operator': 'True', 'run_id': 'manual__2023-12-11T180833.1559240000-b8aec9c3b', 'task_id': 'run_dbt_job_on_k8_demo', 'try_number': '1'}
[2023-12-11, 18:08:48 UTC] {} INFO - `try_number` of task_instance: 1
[2023-12-11, 18:08:48 UTC] {} INFO - `try_number` of pod: 1
[2023-12-11, 18:08:48 UTC] {} WARNING - Pod not yet started: dbt-run-k8-mms8vkif
[2023-12-11, 18:08:49 UTC] {} WARNING - Pod not yet started: dbt-run-k8-mms8vkif
[2023-12-11, 18:08:50 UTC] {} WARNING - Pod not yet started: dbt-run-k8-mms8vkif
[2023-12-11, 18:08:51 UTC] {} WARNING - Pod not yet started: dbt-run-k8-mms8vkif
[2023-12-11, 18:08:52 UTC] {} WARNING - Pod not yet started: dbt-run-k8-mms8vkif
[2023-12-11, 18:08:53 UTC] {} WARNING - Pod not yet started: dbt-run-k8-mms8vkif
[2023-12-11, 18:09:01 UTC] {} INFO - [base] + dbt run --project-dir dbt_k8_demo
[2023-12-11, 18:09:02 UTC] {} INFO - [base] 18:09:01  target not specified in profile 'dbt_k8_demo', using 'default'
[2023-12-11, 18:09:03 UTC] {} INFO - [base] 18:09:02  Running with dbt=1.0.4
[2023-12-11, 18:09:06 UTC] {} INFO - [base] 18:09:03  Partial parse save file not found. Starting full parse.
[2023-12-11, 18:09:06 UTC] {} INFO - [base] 18:09:06  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 188 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
[2023-12-11, 18:09:06 UTC] {} INFO - [base] 18:09:06
[2023-12-11, 18:09:06 UTC] {} INFO - [base] 18:09:06  Encountered an error:
[2023-12-11, 18:09:06 UTC] {} INFO - [base] Runtime Error
[2023-12-11, 18:09:06 UTC] {} INFO - [base]   Unable to generate access token, if you're using impersonate_service_account, make sure your initial account has the "roles/iam.serviceAccountTokenCreator" role on the account you are trying to impersonate.

[2023-12-11, 18:09:07 UTC] {} INFO - [base]   ("Failed to retrieve from the Google Compute Engine metadata service. Status: 404 Response:\nb'Unable to generate access token; IAM returned 404 Not Found: Not found; Gaia id not found for email\\n'", <google.auth.transport.requests._Response object at 0x7ff528226d00>)

[2023-12-11, 18:09:07 UTC] {} WARNING - Follow requested but pod log read interrupted and container base still running
[2023-12-11, 18:09:08 UTC] {} INFO - [base] 18:09:06  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 188 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
[2023-12-11, 18:09:08 UTC] {} INFO - [base] 18:09:06
[2023-12-11, 18:09:08 UTC] {} INFO - [base] 18:09:06  Encountered an error:
[2023-12-11, 18:09:08 UTC] {} INFO - [base] Runtime Error
[2023-12-11, 18:09:08 UTC] {} INFO - [base]   Unable to generate access token, if you're using impersonate_service_account, make sure your initial account has the "roles/iam.serviceAccountTokenCreator" role on the account you are trying to impersonate.

[2023-12-11, 18:09:08 UTC] {} INFO - [base]   ("Failed to retrieve from the Google Compute Engine metadata service. Status: 404 Response:\nb'Unable to generate access token; IAM returned 404 Not Found: Not found; Gaia id not found for email\\n'", <google.auth.transport.requests._Response object at 0x7ff528226d00>)

[2023-12-11, 18:09:08 UTC] {} INFO - Pod dbt-run-k8-mms8vkif has phase Running
[2023-12-11, 18:09:10 UTC] {} INFO - Deleting pod: dbt-run-k8-mms8vkif
[2023-12-11, 18:09:11 UTC] {} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/", line 592, in execute
    return self.execute_sync(context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/", line 632, in execute_sync
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/", line 765, in cleanup
    raise AirflowException(
airflow.exceptions.AirflowException: Pod dbt-run-k8-mms8vkif returned a failure.


  • As often with this kind of GCP/GCC projects, the issue was with the workload identity. From this page, I incorrectly annotated the kubernetes service account with the Google Service account, precisely, I did not use this command correctly

    kubectl annotate serviceaccount KSA_NAME \
        --namespace NAMESPACE \ 

    This led to an issue where the kubernetes service account did not have permission to perform my bigqueries transformations.