Search code examples
regexpostgresqlsql-likestring-matching

select rows if column contain name elements, any order


I have a table that contains a few thousands rows. These results have been produced by querying an api with a list of names.

For a search of "John Snow" the api would return all entries in their database which contained the token "john" and the token "snow". So I have a lot of false positives.

What I would like to do is to subset my table so that the I only have records where the column title (the one which contained the string matched against my search string) contains ALL elements of the searched string, regardless of the order.

I tried

select * from table where 'john snow' ~* title;

select * from table where 'john snow' ILIKE '%' || title || '%';

both work but only if the column title contains exactly john snow in that order.

Before

str_searched |  title
-------------+-------
john snow    | snow white
john snow    | john wick
john snow    | SNOW john
john snow    | john Snow
john snow    | Mr john snow

After

str_searched |  title
-------------+-------
john snow    | SNOW john
john snow    | john Snow
john snow    | Mr john snow

Solution

  • demo:db<>fiddle

    SELECT
        *
    FROM
        my_table
    WHERE
        regexp_split_to_array(lower('john snow'), ' ') 
            <@ regexp_split_to_array(lower(title), ' ')
    
    1. lower() normalizes the capital letters
    2. regexp_split_to_array() tranforms the text into an array at the space
    3. Now you have 2 arrays which can be compared (e.g. [john,snow] and [mr,john,snow]). The <@ comparator checks if the left array is complete contained by the right one.


    Disclaimer: On really huge tables this might be slow. In that case I would recommend to take a look a the full text search engine: https://www.postgresql.org/docs/current/textsearch.html