Search code examples
azure-data-factory

ADF fuzzy left join returns fewer rows than expected


I created a dataflow in ADF in which I am using a fuzzy left outer self join to compare customer names and identify duplicates. I used a sample of the data in the source SQL (2 customers only where both names match 100%). On testing, the regular left join where AccountName == AccountName works as expected and returns 4 records because both names match.

However, when using fuzzy left outer join at 65% match threshold, only 2 of the 4 records are returned. The expectation is that each join would result in the same 4 records.

How does this need to be setup for the fuzzy join to return the same number of records for this sample if I am doing this correctly? I suspect this is a bug with the fuzzy left outer join in ADF.

Here is the data:

Acct_Id AcctName
5799930 RENTAL SERVICE
5799940 RENTAL SERVICE

Flow: Flow

LeftOuter join results:

FuzzyLeft join results: FuzzyLeft join results

LeftOuter join conditions LeftOuter join conditions

FuzzyLeft join conditions FuzzyLeft join conditions


Solution

  • I also tried the same I have 2 records in left table and 4 records in right table.

    • When I left joined it, I got 4 records as I have duplicate records in right table as below: enter image description here
    • In fuzzy joint it is returning the records with common values only as below: enter image description here

    Check if you have any duplicate records or try first convert the datatype of column to string and the use it in join.

    enter image description here

    If still facing same might be issue with matching algorithm, then it's better to raise a Support ticket for deeper investigation on this issue.