Search code examples
sqloraclestring-comparisondata-analysis

Find string similarities between two dimensions in SQL


I have two tables and I want to find matches where values can be found in one of the tables and where they are in the second.

In table A I have a list over search queries by users, and in table B I have a list over a selection of search queries I want to find. To make this work I want to use a method similar to:

SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY('shackleford', 'shackelford') FROM DUAL

I have used this method, but it does not work as it can be a difference between the query and the name in selection.

SELECT query FROM search_log WHERE query IN (SELECT navn FROM selection_table);

Are there any best practice methods for finding similarities through a query?


Solution

  • One approach might be something like:

    SELECT 
        SEARCH_LOG.QUERY
    FROM 
        SEARCH_LOG
    WHERE 
        EXISTS
        (
            SELECT 
                NULL 
            FROM 
                SELECTION_TABLE
            WHERE
                UTL_MATCH.JARO_WINKLER_SIMILARITY(SEARCH_LOG.QUERY, SELECTION_TABLE.NAVN) >= 98
        );
    

    This will return rows in SEARCH_LOG that have a row in SELECTION_TABLE where NAVN matches QUERY with a score of at least 98 (out of 100). You could change the 98 to whatever threshold you prefer.

    This is a "brute force" approach because it potentially looks at all combinations of rows. So, it might not be "best practice", but it might still be practical. If performance is important, you might consider a more sophisticated solution like Oracle Text.