Search code examples
azureazure-sql-databaseretry-logicazure-data-factory

Azure Data Factory - Retry for an insert Copy Activity (AzureSQL DB)


We’ve had twice intermittent issue of the copy activities running into

A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) And on the next run, the issue is not there anymore.

For SQL, say if 100k records get batched into 10k records, will we end up with duplicate records if something happens in the middle of the copy activity? I believe the copy activity is not treated as a single DB transaction.

For UPSERT (copy activities) in SQL, we do have retry enabled, as the key columns will ensure no duplicates will be created. We’re wondering if we can also enable Retry for INSERT (copy activities).

In our other projects, we do have retry enabled for the copy activities for those involving Files (since as per link, files will just be picked up on the one that failed). https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#resume-from-last-failed-run

Resume happens at file level. If copy activity fails when copying a file, in next run, this specific file will be re-copied.

Question is - will it be safe to enable RETRY for Copy Activites doing SQL Inserts (Azure SQL to another Azure SQL table)? Will it cause us to run into duplicate records when a transient error happens in the middle of the operation?


Solution

  • Unfortunately copy activities in adf are not transaction bound and unless there is a pre script involved, copy activity would only append the data thereby creating duplicates. So ideally best way would be to copy it within a staging table and then leverage stored procedure activity to move into final table that would be bound within transaction