Search code examples
sqldatabaseoracle-databasequery-optimizationsql-like

Use of LIKE in Oracle with high number of variables


On Oracle database, I have individual queries which I would like to add based on substitution in the search word. Consider each word can have its word substitution and additional synonym substitution. For instance, the word 4MULA has word substitution such as formula, fourmula. The word VIEW has word substitution such as view, vu and Education has synonyms such as university, college, school

The structure of the where has the following order (3 words):

select * 
from dual 
where name like '%4MULA VIEW%' || ' %UNIVERSITY% ' 

Each word has to be substitute by the all possible values to replace it. I have done individual queries, but then I have an excessive amount of queries (about 500 individual queries replacing all possible substitutions). For instance, another individual query would be:

select *
from dual
where name like '%FOURMULA VU%' || ' %SCHOOL% ' 

The final number of queries depends on the total number of permutations:

FORMULA VIEW UNIVERSITY
4MULA VIEW UNIVERSITY
FOURMULA VIEW UNIVERSITY
FORMULA VU UNIVERSITY
4MULA VU UNIVERSITY
FOURMULA VU UNIVERSITY
...
FOURMULA VU SCHOOL

To optimize time I added all possible permutations using OR condition. For instance, I have added as follows:

select * 
from dual 
where name like '%4MULA VIEW%' || ' %UNIVERSITY% ' 
      OR name like '%FOURMULA VU%' || ' %SCHOOL% ' 
      ...
      OR name like '%FOURMULA VU%' || ' %SCHOOL '

What is the best way to add all permutations in a single query?


Solution

  • You can use object type and collections to create a list of synonyms for each word and then create a user-defined function to handle the substitutions:

    Oracle Setup:

    CREATE TYPE wordlist IS TABLE OF VARCHAR2(30);
    
    CREATE TYPE word_synonyms IS OBJECT(
      word     VARCHAR2(30),
      synonyms wordlist
    );
    
    CREATE TYPE word_synonyms_list IS TABLE OF word_synonyms;
    
    CREATE FUNCTION replace_synonyms(
      text         IN VARCHAR2,
      synonym_list IN word_synonyms_list
    ) RETURN VARCHAR2 DETERMINISTIC
    IS
      p_text VARCHAR2(4000) := text;
    BEGIN
      FOR i IN 1 .. synonym_list.COUNT LOOP
        FOR j IN 1 .. synonym_list(i).synonyms.COUNT LOOP
          p_text := REGEXP_REPLACE(
                      p_text,
                      '(^|[^a-z0-9])' || synonym_list(i).synonyms(j) || '($|[^a-z0-9])',
                      '\1' || synonym_list(i).word || '\2',
                      1,
                      0,
                      'i'
                    );
          -- Handle repeated synonyms.
          p_text := REGEXP_REPLACE(
                      p_text,
                      '(^|[^a-z0-9])' || synonym_list(i).synonyms(j) || '($|[^a-z0-9])',
                      '\1' || synonym_list(i).word || '\2',
                      1,
                      0,
                      'i'
                    );
        END LOOP;
      END LOOP;
      RETURN p_text;
    END;
    /
    

    Test Data:

    CREATE TABLE test_data ( text ) AS
    SELECT 'FORMULA VIEW UNIVERSITY'    FROM DUAL UNION ALL
    SELECT '4MULA VIEW UNIVERSITY'      FROM DUAL UNION ALL
    SELECT 'FOURMULA VIEW UNIVERSITY'   FROM DUAL UNION ALL
    SELECT 'FORMULA VU UNIVERSITY'      FROM DUAL UNION ALL
    SELECT '4MULA VU UNIVERSITY'        FROM DUAL UNION ALL
    SELECT 'FOURMULA VU UNIVERSITY'     FROM DUAL UNION ALL
    SELECT 'FOURMULA VU SCHOOL'         FROM DUAL UNION ALL
    SELECT 'FOURMULAE VULCAN SCHOOLING' FROM DUAL;
    

    Query:

    SELECT text,
           replace_synonyms(
             text,
             word_synonyms_list(
               word_synonyms( '4MULA',      wordlist( 'formula', 'fourmula' ) ),
               word_synonyms( 'VIEW',       wordlist( 'vu' ) ),
               word_synonyms( 'UNIVERSITY', wordlist( 'school' ) )
             )
           ) AS substitutions
    FROM   test_data
    

    outputs:

    TEXT                       | SUBSTITUTIONS             
    :------------------------- | :-------------------------
    FORMULA VIEW UNIVERSITY    | 4MULA VIEW UNIVERSITY     
    4MULA VIEW UNIVERSITY      | 4MULA VIEW UNIVERSITY     
    FOURMULA VIEW UNIVERSITY   | 4MULA VIEW UNIVERSITY     
    FORMULA VU UNIVERSITY      | 4MULA VIEW UNIVERSITY     
    4MULA VU UNIVERSITY        | 4MULA VIEW UNIVERSITY     
    FOURMULA VU UNIVERSITY     | 4MULA VIEW UNIVERSITY     
    FOURMULA VU SCHOOL         | 4MULA VIEW UNIVERSITY     
    FOURMULAE VULCAN SCHOOLING | FOURMULAE VULCAN SCHOOLING
    

    and:

    SELECT text
    FROM   test_data
    WHERE  replace_synonyms(
             text,
             word_synonyms_list(
               word_synonyms( '4MULA',      wordlist( 'formula', 'fourmula' ) ),
               word_synonyms( 'VIEW',       wordlist( 'vu' ) ),
               word_synonyms( 'UNIVERSITY', wordlist( 'school' ) )
             )
           )
           LIKE '%4MULA VIEW%%UNIVERSITY%'
    

    outputs:

    | TEXT                     |
    | :----------------------- |
    | FORMULA VIEW UNIVERSITY  |
    | 4MULA VIEW UNIVERSITY    |
    | FOURMULA VIEW UNIVERSITY |
    | FORMULA VU UNIVERSITY    |
    | 4MULA VU UNIVERSITY      |
    | FOURMULA VU UNIVERSITY   |
    | FOURMULA VU SCHOOL       |
    

    db<>fiddle here


    Currently, in test_data table I'd have just a record such as FORMULA VIEW UNIVERSITY. In addition, I have a table for word substitution and synonyms in very similar fashion you included. Considering I have just the record FORMULA VIEW UNIVERSITY how could I invoke replace_synonyms such as: SELECT text FROM test_data WHERE replace_synonyms(...)) LIKE '%4MULA VU%%UNIVERSITY%'. Notice that usually I will want to compare a new name which is not in test_data as the case of '%4MULA VU%%UNIVERSITY%'

    Swap it round so that the synonyms are normalised in the LIKE filter and not in the text in the table:

    CREATE TABLE test_data ( text ) AS
    SELECT 'FORMULA VIEW UNIVERSITY'    FROM DUAL;
    

    Then:

    SELECT text
    FROM   test_data
    WHERE  text LIKE replace_synonyms(
             '%4MULA VU%%SCHOOL%',
             word_synonyms_list(
               word_synonyms( 'FORMULA',      wordlist( '4mula', 'fourmula' ) ),
               word_synonyms( 'VIEW',       wordlist( 'vu' ) ),
               word_synonyms( 'UNIVERSITY', wordlist( 'school' ) )
             )
           )
    

    Outputs:

    | TEXT                    |
    | :---------------------- |
    | FORMULA VIEW UNIVERSITY |
    

    db<>fiddle here