I am trying to compare the contents of two SQL tables in Azure Data Factory (ADF) using the Exists transformation in the data flow to find the delta data. My goal is to move the delta data to the sink. Here are the details:
Approach:
In my ADF data flow, for each table, I use the sha2() function to generate a derived hash column like this: derivedColumn = SHA2(256, columns())
The underlying business case is that the rows could be the same, but the data within the rows may change over time. Hence, I am comparing the entire content by calculating a hash column on all the columns instead of just comparing unique columns.
I then use the Exists transformation to find rows in Table 1 that do not exist in Table 2.
The delta data is supposed to be upserted to the sink.
Problem:
When I preview the data in the Exists transformation, I see a lot of rows as a result. In my case, it should return zero rows since Table 1 (left side table) has fewer rows than Table 2, and I know for a fact that all these rows and data are already present in Table 2.
Question:
Additional Information:
Any insights or suggestions would be highly appreciated. Thank you!
Screenshots of the dataflow:
This issue in dataflow data preview is not actual error. The reason for data mismatch in data preview is because, in debug settings, source1 and source2 are limited to random 1000 rows for data preview. This is the reason for not getting the expected result. When you debug the pipeline with this dataflow, you will not get this error.
Refer MS doc regarding this debug settings.