We have a DBT application which we run in pods using Apache Airflow in AWS. I have a model which I need to run for a specific appId. I need to pass the parameter appId at runtime in cli command to the DBT model so that the model runs only for that specific appId.
We run the following DBT command to run a model on our local machine.
dbt -d run --model abc
The Apache airflow code that we use to run DBT model is :
abc_task = KubernetesPodOperator(namespace='etl',
image=f'blotout/dbt-analytics:{TAG_DBT_ANALYTICS}',
cmds=["/usr/local/bin/dbt"],
arguments=['run', '--models', 'abc_task'],
env_vars=env_var,
name="abc_task",
configmaps=['awskey'],
task_id="abc_task",
get_logs=True,
dag=dag,
is_delete_operator_pod=True,
)
We need something like this:
{%- set appIdList = ['{{ var("appId") }}'] -%}
And the value of appId should be passed through Airflow task as shown above in the CLI command.
Have you considered using environment variables to share the information from your Apache Airflow DAG with the Kubernetes Pod running dbt?
In your case, you could declare APP_ID
within the env_vars
dictionary.
Inside the dbt model file, you can use the env_var
function to incorporate environment variables from the system into the model using Jinja:
{{ env_var('APP_ID') }}
The dbt docs give more details about this feature: https://docs.getdbt.com/reference/dbt-jinja-functions/env_var