my stakeholders are requesting that after each model that updates or inserts new data (through merge) into some table, a log table needs to be updated with something like this:
log_table:
TABLE_LOADED INSERTED_ROWS UPDATED_ROWS
TAB_1 1500 1000
Any idea on how to approach this? How to identifies maybe with a post hook or a package that counts the inserted rows and updated rows separately?
Thanks
Apart from the Usage view in Snowflake, my suggestion is to leverage the row level metadata in your every model(s) (ref to https://infinitelambda.com/dbt-observability-row-level-metadata/), something that if you got the timestamp fields in your model e.g. inserted_at
, updated_at
, you could probably achieve count the rows:
inserted_at
= updated_at
= T1inserted_at
= T1, updated_at
= T2inserted_at
= T1, updated_at
= T2 and is_deleted
= trueBy having these things, you could use pre/post-hook to count rows and insert it to log table:
select count(*)
from {{ this }}
where inserted_at = updated_at
and inserted_at > {{ run_start_at }}
--or better to have a variable in the model that hold the the start time of the model (be mindful of the timezone)
select count(*)
from {{ this }}
where inserted_at < updated_at
and inserted_at > {{ run_start_at }}
--or better to have a variable in the model that hold the the start time of the model (be mindful of the timezone)
Hope this helps!