Search code examples
ssisbusiness-intelligencestaging

About staging tables and merge


I'm really new in the BI world, and some concepts seems misunderstood for me.

I'm reading some articles and books about this, they are full of graphics and flows that does not tell much about the process in practice.

About the staging tables and the extraction process.

I know that the tables in staging area need to be deleted after the flow has been executed.

Considering this, imagine a flow with a initial full extraction to the target database. Then, using a merge cdc, i need to identify what was updated in the source tables. My doubt is here, how can i know what was updated since my tables are on the target, and the data on staging has been deleted?

I need to bring the data of the target tables to the staging area and then do the merge?


Solution

  • Change Data Capture (CDC) is usually done on the source system, either with an explicit changed field (either a simple boolean or a timestamp) or automatically by the underlying database management system.

    If you have a timestamp field in your data you first do your initial load to staging, record the maximum timestamp retrieved, and then on the next update you only retrieve records where the timestamp is greater than your recorded value. This is the preferred way to do it if there's no real CDC functionality on the source system.

    Using a boolean field is trickier as all inserts and updates to the source-system must set it to true and after your extraction you'll have to reset it to false.