I am trying to find missing records from source and target(Sink) file. Above is an example of a CSV file.
Important Points:
Below is the print screen of the empty output:
Exists Transformation is not able to find the missing records when sink has more records, because it is designed to compare the left stream with the right stream. If the right stream has more records than the left stream, then using only the Exists Transformation may not be able to find the missing records. One way to achieve the requirement is to use join transformation with join type as full outer join
. Full outer join outputs all columns and rows from both sides with NULL values for columns that are not matched. You can get the records which are present in source and not in sink, by using the filter transformation with the join transformation output as input and filter condition as isNull(sink_col_name)
. Similarly, to get the records that are present in sink and not in source, you can filter with the condition isNull(source_col_name)
.
Below is the sample dataflow with Join and filter transformations to find the missing records in source and sink.
Join transformation output:
Filter1 settings:
Filter1 Output: Missing records in sink.
Filter2 settings:
Filter2 Output: Missing records in source.