We are building a Data Vault (2.0)
model to capture SalesForce data. Like many other sources, the records in the source are soft deleted. While we are sourcing data to the Data Model, we do not want to filter any data & also capture deleted records in the target system. Searched for the best practice to handle deleted records in the Data Vault model, however no luck. Can someone please throw some light here? Should we add IsDeleted
flag in Hub
or Satellite
considering the future expansion of the model & best design practices to follow. Also, any reference material links here would be of great help. Thank you.
In the DV2.0 specification, there's a "Record source tracking satellite" that does exactly what you want to do, keep tracks of inserted, updated and deleted records per source (p.143 if you have the book).
Basically, this is a satellite with a hashkey, loaddate, record source and a status (I/U/D). You insert a record each time a record is added, updated or deleted in a source system.
In DV1.0, there was an "last seen" field, but has been removed for performance issue ( https://danlinstedt.com/allposts/datavaultcat/end_of_updates/ )