Search code examples
amazon-redshiftetldbt

Implementing scd2 in dbt using aws redshift, how do I define conditional natural keys?


Implementing scd2 in dbt using aws redshift. How do I define conditional natural keys?

unique_id = ['crm_id', 'curr_recrd_flg', 'actve_flg']

I want to provide conditions like curr_recrd_flg = 'Y' and actve_flg = 'Y'

Thanks in advance!


Solution

  • dbt can automate the creation of an SCD Type-2 table from a dimension table using its snapshot feature.

    First, you define your snapshot in a .sql file inside of a directory called snapshots inside of your project directory:

    -- snapshots/my_snapshot.sql
    {% snapshot my_snapshot() %}
    {{
        config(
          target_database='analytics',
          target_schema='snapshots',
          unique_key='crm_id',
    
          strategy='timestamp',
          updated_at='updated_at',
        )
    }}
    -- QUERY TEXT HERE
    {% endsnapshot %}
    

    The query in your snapshot definition can do anything. In your case, it seems like you want to filter on the "current record" fields, so that would look like:

    -- snapshots/my_snapshot.sql
    {% snapshot my_snapshot() %}
    {{
        config(
          target_database='analytics',
          target_schema='snapshots',
          unique_key='crm_id',
    
          strategy='timestamp',
          updated_at='updated_at',
        )
    }}
    select *
    from {{ source('my_source', 'my_table') }}
    where curr_recrd_flg = 'Y' and actve_flg = 'Y'
    {% endsnapshot %}
    

    Then you can execute dbt snapshot at the command line, and dbt will capture the rows returned by your query. Executing dbt snapshot again will execute the query again and update the snapshot table with any changes, in the style of SCD-2.