Search code examples
google-bigquerydatabase-partitioningdbt

Prune BigQuery partitions when building incremental table with DBT


I have the same problem again as this question:

What's the problem?

How do you make an incremental table using BigQuery in DBT without scanning the entire table every time?

The suggested incremental table format doesn't work (it scans the whole table) and DECLARE isn't supported (I think?).

Details

The suggested incremental format for DBT involves something like this:

{% if is_incremental() %}
WHERE 
   AND _partitiontime > (select max(_partitiontime) from  `dataset.table`)
{% endif %}

First run with incremental table builds and adds a lot of rows. enter image description here

Second run adds a tiny number of rows but still scans the entire table. enter image description here

BigQuery will scan the entire table for every incremental run, meaning you're paying the full cost every day.

The recommended solution by BigQuery is to use DECLARE:

DECLARE max_date DATE;
SET max_date = (select max(_partitiontime) from `dataset.table`); 

This post suggests that isn't possible.

Is there a workaround people are doing here? Is there some sort of escaped way I can set up DECLARE with DBT, or another solution I haven't seen?

Other context

I've previously posted a version of it involving Data Studio:

Couldn't figure out how to do it there either.


Solution

  • Turns out there is a DBT shortcut _dbt_max_partition which goes through the steps of declaring a variable and so the partitions are correctly pruned.

     {% if is_incremental() %}
          AND _partitiontime >= _dbt_max_partition
     {% endif %}
    

    I found an excellent dbt helper post here.