Search code examples
azureazure-data-factoryazure-synapse

Compare two tables to find missing rows dynamically in Azure synapse data flow


enter image description hereI am trying to find missing records from source and target(Sink) file. Above is an example of a CSV file.

Important Points:

  1. I am trying to get the missing data in both(source/sink) CSV files using Azure Dataflow Exists Transformation.
  2. If the source have more record(for example 15 records) and the sink have fewer records(for example 10 record) I am getting the missing records(missing 5 records in output) from both CSV files But if the source have fewer(for example 10 record) and the sink has more records (for example 15 record) that time I am not getting any records as output. I am expecting missing records here also.

Below is the print screen of the empty output:

enter image description here


Solution

  • Exists Transformation is not able to find the missing records when sink has more records, because it is designed to compare the left stream with the right stream. If the right stream has more records than the left stream, then using only the Exists Transformation may not be able to find the missing records. One way to achieve the requirement is to use join transformation with join type as full outer join. Full outer join outputs all columns and rows from both sides with NULL values for columns that are not matched. You can get the records which are present in source and not in sink, by using the filter transformation with the join transformation output as input and filter condition as isNull(sink_col_name). Similarly, to get the records that are present in sink and not in source, you can filter with the condition isNull(source_col_name).

    Below is the sample dataflow with Join and filter transformations to find the missing records in source and sink.

    Join transformation output:

    enter image description here

    • New branch is created in Join transformation and in each branch, filter transformation is added.

    Filter1 settings:

    enter image description here

    Filter1 Output: Missing records in sink.

    enter image description here

    Filter2 settings:

    enter image description here

    Filter2 Output: Missing records in source.

    enter image description here