Search code examples
sqlsql-serverssisetlsql-job

Best method for an ETL Package to load all data initially and then next run to only load changes or new items


I am new to SSIS and would like the best method for an ETL package creation to load all data initially and then next run to only load changes or new items. I will use the package on a schedule SQL job. I know that I could set SSIS package to truncate the destination table but for me that is a waste of resources and if large tables are concern the transaction log would be huge.

Thanks in advance!


Solution

  • I think there are more than one method to achieve that:

    1. Adding a reference table that contains the last inserted ID (or primary key value), and this table must be update each time.
    2. Using Change Data Capture (CDC), you can refere to the this article for more information about it: Introduction to Change Data Capture (CDC) in SQL Server 2008 (Note that the database must be configured to enable CDC - may not work with your situation)
    3. Adding a LastUpdate column to the table, and read only date where LastUpdate column value is higher than the last running SSIS job date.