I have two large tables I'm trying to join & filter using dbt.
The SQL is very simple, roughly:
SELECT
u.user_id, t.transaction_id
FROM users u
JOIN transactions t ON t.user_id = u.user_id
WHERE u.active = 1
Currently I'm using the "table" materialization, but this is fairly wasteful, as the tables underlying tables are 99.99% the same from run to run.
However, I don't understand from the DBT documentation how I could set this model to "incremental".
Any ideas?
PS. I'm running on SQL Server.
As @anders-swanson wrote in his comment, if transaction_id
is definitely unique, you could set it as the unique_key
and materialize your model as an incremental table.
dbt's docs explain how to do this. Using your example, it might be:
{{
config(
materialized='incremental',
unique_key='transaction_id'
)
}}
select
u.user_id, t.transaction_id
from users u
join transactions t ON t.user_id = u.user_id
where u.active = 1
If transaction_id
is not unique but transaction_id
||user_id
is, you could try creating a new column which concatenates these columns in an upstream dbt model, and then assigning that as the unique_key
:
{{
config(
materialized='incremental',
unique_key='pkey'
)
}}
select
u.user_id,
t.transaction_id,
u.user_id||t.transaction_id as pkey
from users u
join transactions t ON t.user_id = u.user_id
where u.active = 1
Otherwise, you'll have to pull in a column that is either a) unique, or b) has an ordered quality that could be used to apply an is_incremental()
filter (like @viacheslav-nefedov wrote).