Search code examples
sqlsql-serverpattern-matchingequivalent

String Pattern Matching for Limited, Ltd, Incorporated, Inc, Etc


We're doing a LOT of work towards trying to reconcile about 1,000 duplicate manufacturer names and 1,000,000 duplicate part numbers. One thing that has come up is how to "match" things like "Limited" vs. "Ltd." vs. "Ltd"

The purpose is for the application to reconcile these matched items into a standard format. So:

ACME Ltd. ACME Limited ACME Ltd

Should all be reconciled into ACME Ltd.

This will also be used to prevent entering additional duplicates in the future.

Any suggestions on how to accomplish this pattern matching in SQL Server? Any known algorithms to find items with mapped equivalencies, etc...?

Thanks!

Eric.


Solution

  • How about a table that lists what you want in one column and variations in the next?

    Ltd   Limited 
    Ltd   Ltd.
    St    Street
    St    Str.
    

    Then, if you find a match on the second column, you change it to the first. It may take several iterations, as you find other alternatives.