Search code examples
azure-data-factoryazure-data-explorerazure-cosmosdb-mongoapi

Azure data explorer deal with duplicates and deleted records


I am trying to ingest data into my azure data explorer table from my cosmos db for mongo db collection. I was able to do this using azure data factory where I can create a pipeline that fetches all records every x hours.

The problem is when it fetches these records it will insert them in data explorer table, regardless if they already exist or not. Also if a record was deleted from my mongo db collection, this record will never be deleted from my azure data explorer table.

I tried to create a data flow using data explorer as sink (It has an option to recreate the table again before copying these records which would solve the problem, because the table would be deleted and created again before copying the records) but cosmos db for mongo db is not supported as a source.

Any idea How I can maintain a table in azure data explorer that fetches data from a mongo db collection every x hours? (without having duplicate records)


Solution

  • Since your requirement is to copy the data without having duplicate records, you can use Azure Data explorer command activity to clear the data in the Kusto table and then use the copy activity to copy the data from Azure cosmos db (mongo db) to Kusto database.

    • Take the Azure Data Explorer command activity and give the command as,

    .clear table <table-name> data;

    Replace the <table-name> with the actual sink table name.

    enter image description here

    • Then take the copy activity with mongo db collection as source dataset and Kusto database table as a sink dataset.

    By this way, data can be copied without any duplicates.