Search code examples
postgresqljoinstring-matchingsimilarityfuzzy

How can I use ~ to fuzzy match two fields of a table?


I'm trying to perform a join over two tables that contain info about the same companies, but sometimes the companies are stored with slightly different names (e.g. table 1: Company X -> Table 2: Company X and Friends). My idea was to full join each row of table 1 with each row of table 2 and then using ~ to filter (e.g. where name1 ~ name2 or name2 ~ name1), but that's not possible.

Do someone has a solution to this matching problem? Thanks!


Solution

  • You could join on the condition that either name field be a substring of the other's name field:

    SELECT t1.*, t2.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t1.name LIKE '%' || t2.name || '%' OR
           t2.name LIKE '%' || t1.name || '%';
    

    This approach does not even require regex. We could use regex here, if we wanted to ensure that one table's name only appears as a substring of the other's name and is also a word. But, maybe you don't even need to do this.