I have a table that contains a few thousands rows. These results have been produced by querying an api with a list of names.
For a search of "John Snow" the api would return all entries in their database which contained the token "john" and the token "snow". So I have a lot of false positives.
What I would like to do is to subset my table so that the I only have records where the column title
(the one which contained the string matched against my search string) contains ALL elements of the searched string, regardless of the order.
I tried
select * from table where 'john snow' ~* title;
select * from table where 'john snow' ILIKE '%' || title || '%';
both work but only if the column title
contains exactly john snow
in that order.
Before
str_searched | title
-------------+-------
john snow | snow white
john snow | john wick
john snow | SNOW john
john snow | john Snow
john snow | Mr john snow
After
str_searched | title
-------------+-------
john snow | SNOW john
john snow | john Snow
john snow | Mr john snow
SELECT
*
FROM
my_table
WHERE
regexp_split_to_array(lower('john snow'), ' ')
<@ regexp_split_to_array(lower(title), ' ')
lower()
normalizes the capital lettersregexp_split_to_array()
tranforms the text into an array at the space[john,snow]
and [mr,john,snow]
). The <@
comparator checks if the left array is complete contained by the right one.