Search code examples
sqldbt

Insert only specific columns with DBT


As of now, I have an INSERT INTO clause, that works like this:

insert into final_table (name, age, created_at)
select *
from person_table

The final_table has 4 columns: ID (incremental/serial), name, age and created_at. The person_table has only name, age, and created_at.

When I translate it to the dbt logic, it looks like this:

{{
    config(
        materialized = 'incremental'
    )
}}

with person_table as (
  select *
  from person_table
)

select
   name,
   age,
   created_at
from person_table

{% if is_incremental() %}
    AND created_at > (select max(created_at) from {{ this }})
{% endif %}

However, dbt keeps compiling as an INSERT CLAUSE containing the ID (that it is auto-generated). Does anyone know how to insert only the specific columns that I want?


Solution

  • You can use row_number to fake an auto-incrementing key in Snowflake.

    In a model materialized as a table, that would be as simple as:

    select
       name,
       age,
       created_at,
       row_number() over (order by created_at asc) as id
    from person_table
    

    But since this is an incremental model, your inner query will be filtered before row_number() is computed, so we need to add on the maximum value of id in the table if the model is being run in incremental mode:

    {{
        config(
            materialized = 'incremental'
        )
    }}
    
    {% if is_incremental() %}
        {% set max_id = "(select max(id) from " ~ this ~ ")" %}
    {% else %}
        {% set max_id = 0 %}
    {% endif %}
    
    with person_table as (
      select *
      from {{ ref('person') }}
    )
    
    select
       name,
       age,
       created_at,
       {{ max_id }} + row_number() over (order by created_at asc) as id
    from person_table
    where 1=1
    {% if is_incremental() %}
        AND created_at > (select max(created_at) from {{ this }})
    {% endif %}
    

    Note that, if you --full-refresh this model, and records were deleted from person_table, then any records created after the deleted records will have their IDs changed. No real way around that, unfortunately! If you need an immutable ID, I would suggest hashing some combination of name, age, and created_at using md5 or dbt_utils.surrogate_key. Of course, those hashed ID's would not be monotonically increasing.