I need some wisdom for a new DBT use case that I am trying to solve. I am pretty new to DBT and not sure what is the most efficient DBT way. We are using snowflake as our DWH.
We have a lot of incremental models that are managed with DBT. Lately, we had the need to add a new column to all models. What would be the most efficient DBT way to do it? Should we override the incremental macro script? (I found this for snowflake.) I assume that the last resort will be to add the new column manually to each model.
You can either --full-refresh
all the incremental models or perform this schema migration outside of dbt.
I would recommend using --full-refresh
if you are able to. since --full-refresh
rebuilds the table, it takes care of the schema changes and the historical values of the new column.
On the current version of dbt v0.21.0
, a new incremental setting was introduced, on_schema_change
. You can set it to append_new_columns
Quoting some relevant sections from the documentation:
New on_schema_change config in dbt version v0.21.0 Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.
append_new_columns: Append new columns to the existing table. Note that this setting does not remove columns from the existing table that are not present in the new data.
Note: None of the on_schema_change behaviors backfill values in old records for newly added columns. If you need to populate those values, we recommend running manual updates, or triggering a --full-refresh.
If --full-refresh
is not an option or you're on an older dbt version, then your schema migration would have to be done manually.
The steps are:
alter table my_incremental_table add column new_column_name data_type
my_incremental_table
to add new_column_name
to the end of the select query's list of columnsThis will work because dbt is stateless, but as this is a manual operation, I don't recommend this if you can avoid it.
note also, that if you use the on_schema_change
method, you'd still need to do a backfill of the new column manually.