I created a dataflow in ADF in which I am using a fuzzy left outer self join to compare customer names and identify duplicates. I used a sample of the data in the source SQL (2 customers only where both names match 100%). On testing, the regular left join where AccountName == AccountName works as expected and returns 4 records because both names match.
However, when using fuzzy left outer join at 65% match threshold, only 2 of the 4 records are returned. The expectation is that each join would result in the same 4 records.
How does this need to be setup for the fuzzy join to return the same number of records for this sample if I am doing this correctly? I suspect this is a bug with the fuzzy left outer join in ADF.
Here is the data:
Acct_Id | AcctName |
---|---|
5799930 | RENTAL SERVICE |
5799940 | RENTAL SERVICE |
Flow:
LeftOuter join results:
FuzzyLeft join results:
LeftOuter join conditions
FuzzyLeft join conditions
I also tried the same I have 2 records in left table and 4 records in right table.
Check if you have any duplicate records or try first convert the datatype of column to string and the use it in join.
If still facing same might be issue with matching algorithm, then it's better to raise a Support ticket for deeper investigation on this issue.