Search code examples
snowflake-cloud-data-platformdbt

How to use dbt's incremental model without duplicates


I have a query that selects some data that I would like to use to create an incremental table. Something like:

{{
    config(
        materialized='incremental',
        unique_key='customer_id'
    )
}}
SELECT
    customer_id,
    email,
    updated_at,
    first_name,
    last_name
FROM data

The input data has duplicate customers in it. If I read the documentation correctly, then records with the same unique_key should be seen as the same record. They should be updated instead of creating duplicates in the final table. However, I am seeing duplicates in the final table instead. What am I doing wrong?

I am using Snowflake as a datawarehouse.


Solution

  • If your source table already contains the duplicate, this is the regular behavior.

    As per dbt documentation: "The first time a model is run, the table is built by transforming all rows of source data."

    Docs: https://docs.getdbt.com/docs/build/incremental-models

    This means basically that the duplicates will be avoided in all future loads, but not during the initial creation. Hence you need to change your SELECT statement so that duplicates are somehow filtered out in the creation itself.