Search code examples
dbt

DBT: I need to load historic data in DBT into my type 2 SCD dims. I have parameterized the query. How can I load the history without manually doing so


I need to load historic data in DBT into my type 2 SCD dims. I have parameterized the query. How can I load the history without manually changing the parameter in dbt for each day. I need to load 5 years of historic data.

Example; I can run the parameter dbt run --vars '{"run_date": "1/30/2022"}' to run the data for one day. Is there a way where I can create a loop to cycle thru the run dates and load one day at a time of the history?


Solution

  • If your model can absolutely only operate on one date at a time:

    1. You could create a custom dbt materialization to do the backfill for you. See the insert by period materialization for inspiration (note, this macro itself will not do exactly what you want). This has the benefit of being easier to re-run if you ever need to rebuild this table (which you probably will, someday)

    2. It would probably be easier to write a shell script to just run dbt many times. See this answer on how to do that

    But really the best option for you is going to be to re-work your model logic so you can pass in a range of dates (or a list of dates) to your model. This is probably going to be hard, but otherwise building this model in multiple environments is going to be a huge pain.

    As an aside, it seems like what you're building is a snapshot, not a model. Running models should be idempotent, and models should be built into all dev environments; snapshots not so much.