Search code examples
azurestored-proceduresazure-data-factoryazure-synapse

Delete sink records when records are deleted in source with ADF pipeline copy data activity


In ADF I've created a pipeline which contains multiple copy data activities. All of those activities have SQL as a source and SQL as a sink.

I'm making use of the upsert function as much as I can to perform an insert or update action. This function doesn't support delete actions, so that's why I've found out that I'm having a mismatch of records in some of the the source and sink tables.

So in my case I've the following data issue, on the left the source table and on the right the sink table (destination). Where I'm expecting to have in the sink table the same data as in the source data table:

CustomerId CustomerId
1 1
2 2
3
4 4
5 5

What is the best way to solve this issue?

I could truncate the sink table with every pipeline run and create another destination table. So that when the new data in the sync tables has arrived I'm executing a stored procedure which merges the data into a destination table. This way I'm sure that the user of the sync tables (which is mostly a business intelligence tool) has data all the time. But this feels a bit old school... I've the feeling that there should be another (better) solution to solve this.

Am I supposed to solve those kind of issues in a data flow? I've quite a lot of tables where in the source application it's allowed to remove records. This would mean that for every table I've to create a data flow, this can be quite data/time consuming.


Solution

    • Using the truncate table on sink and creating a new table would achieve the job for you.

    enter image description here

    • Dataflow's alter row transformation only allows you to only alter the data from transformed source data. Since sink has to be checked, dataflows would complicate the requirement.

    • If majority of the records are to be deleted from sink to match the source data, then truncating might be the better option.

    • As an alternative in case there are very few records that need to be deleted, you can stage the source table into the destination database.

    • Using this temporary table, delete records which are not present in source table from sink table using a query similar to below:

    delete from t2 where ID not in (select ID from t1)
    
    • To copy the source table from source database to destination database, you can refer to this SO answer.