Search code examples
google-bigquerypartitioningpartitiondbt

How do you partition tables in BigQuery using DBT


I am new to DBT and have previously been using Airflow for data transformations.

In Airflow there is a variable called {{ ds }} which represents the logical date in this form YYYY-MM-DD and {{ ds_nodash }} which represents the logical date in this form YYYYMMDD. I can then set up a task similar this:

my_task = BigQueryOperator(
  task_id='t_my_task',
  sql= """ SELECT * FROM my_table where my_date="{{ ds }}" """,
  destination_dataset_table='my_project.my_dataset.my_table_new${{ ds_nodash }}',
  write_disposition='WRITE_TRUNCATE',
  dag=dag
)

This means that I am running the SQL query given on the third line and this will overwrite the table on the fourth line. In the Airflow interface, if I rerun say just the day "2022-01-11" then it will automatically overwrite that partition on that date.

I am trying to figure out how to do the same in DBT.


Solution

  • With DBT you can do that by using an incremental model.

    In dbt, you describe the data you want with SQL statements, and the materialization you choose dictates the way it's going to be materialized in your warehouse:

    • ephemeral materialization: a CTE that will be short-lived,
    • table materialization: a BQ table, but everytime you re-run your DBT project it gets deleted and re-created (not suitable for big tables)
    • view materializations: views...
    • and incremental models

    With incremental models you basically have a big table where you want to insert new rows. This materialization lets you add rules such as "insert rows from table source_table where timestamp >= today".

    In your case, with DBT + BigQuery, you have 2 options: merge or insert+ overwrite, but from your description you'll want to use the latter.

    You'll need to include this at the beginning of your model:

    {{
      config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        ...
      )
    }}
    

    For reference you can go there and there.