Search code examples
azureazure-data-factoryazure-data-explorer

How to deal with duplicate insertion in Azure Data Explorer


I have an ADF pipeline to copy data from ADLS to Data Explorer(ADX). I am reading all the data and able to copy to ADX. enter image description here Problem is that every time my pipeline runs and ingest data the ADX table gets duplicate data in table.

The source folder containing log files which are logs of a copy activity triggered every two hours with new files. And the logs contains new copied file details.

So, how do I fix the issue of duplication, is there a way to upsert ?


Solution

  • As Nikolai suggested in his third point- Clear the destination table prior to export where you delete all records from the table and then export your latest batch. This is not really a great option but could be a quick and dirty solution if your source stores all the data that you need.

    And since it is a quick fix for me, I used same and achieved using below.

    I have used a Azure Data Explorer Command activity for this followed by a Copy Activity. It works like a charm. We can run definition query as well as control queries using ADX Command activity.

    enter image description here

    enter image description here