Search code examples
snowflake-cloud-data-platformdbt

Automatically add new column to incremental (or other type)


I need some wisdom for a new DBT use case that I am trying to solve. I am pretty new to DBT and not sure what is the most efficient DBT way. We are using snowflake as our DWH.

Problem

We have a lot of incremental models that are managed with DBT. Lately, we had the need to add a new column to all models. What would be the most efficient DBT way to do it? Should we override the incremental macro script? (I found this for snowflake.) I assume that the last resort will be to add the new column manually to each model.


Solution

  • You can either --full-refresh all the incremental models or perform this schema migration outside of dbt.

    I would recommend using --full-refresh if you are able to. since --full-refresh rebuilds the table, it takes care of the schema changes and the historical values of the new column.

    On the current version of dbt v0.21.0, a new incremental setting was introduced, on_schema_change. You can set it to append_new_columns

    Quoting some relevant sections from the documentation:

    New on_schema_change config in dbt version v0.21.0 Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.

    append_new_columns: Append new columns to the existing table. Note that this setting does not remove columns from the existing table that are not present in the new data.

    Note: None of the on_schema_change behaviors backfill values in old records for newly added columns. If you need to populate those values, we recommend running manual updates, or triggering a --full-refresh.

    If --full-refresh is not an option or you're on an older dbt version, then your schema migration would have to be done manually.

    The steps are:

    1. add the new column via an alter table alter table my_incremental_table add column new_column_name data_type
    2. run an update query to hydrate the new column
    3. edit the dbt model for my_incremental_table to add new_column_name to the end of the select query's list of columns

    This will work because dbt is stateless, but as this is a manual operation, I don't recommend this if you can avoid it.

    note also, that if you use the on_schema_change method, you'd still need to do a backfill of the new column manually.