Search code examples
databasepostgresqlfunctional-programmingetlairflow

What is a "dimension snapshot" in a functional data engineering approach to deal with slowly changing dimensions?


As mentioned by Maxime Beauchemin in his popular post Functional Data Engineering — a modern paradigm for batch data processing, Maxime recommends dealing with slowly changing dimensions by taking dimension snapshots where a new partition is appended at each ETL schedule.

But how do we model this in a functional data warehouse without mutating data? Simple. With dimension snapshots where a new partition is appended at each ETL schedule. The dimension table becomes a collection of dimension snapshots where each partition contains the full dimension as-of a point in time.

I tried to dig in the comments for an answer but could not find a simple explanation. What is meant here by taking a snapshot of the dimension and appending it to the daily partition?


Solution

  • A snapshot is a table value that was the value of some base/variable table as of some time/version. Here "dimension snapshot" seems to mean "dimension table snapshot". The author seems to suggest a new regime whose dimension table is like the old one with an added time/version column/dimension on which they partition. Under the old regime we make updates to the dimension table, at certain times, each a version. Under the new regime each time/version we would have updated the old dimension table to some new state we instead take (a snapshot of) that state, add the new column set to the time/version, and add those rows to the new dimension table.

    Ask the author, it's a recent blog.