...and is it something I should do anyway?
From my brief testing, making a trigram index and searching using
where name like '%query%'
is faster than
where name ilike '%query%'
So it seems like I should do it, but I'm surprised I've not been able to find out how.
(My test data is fairly homogenous - 1.5M rows made up of 16 entries repeated. I can imagine this might mess with the results.)
This is how I expected it to work (note the lower(name)
):
create extension pg_trgm;
create table users(name text);
insert into users values('Barry');
create index "idx" on users using gin (lower(name) gin_trgm_ops);
select count(*) from users where (name like '%bar%');
but this returns 0
.
Either of
select count(*) from users where (name like '%Bar%');
or
select count(*) from users where (name ilike '%bar%');
work, which makes me believe the trigrams in the index are not lower()
'd. Am I misunderstanding how this works under the hood? Is it not possible to call lower
there?
I note that this
select show_trgm('Barry')
returns lowercase trigrams:
{" b"," ba",arr,bar,rry,"ry "}
So I'm perplexed.
The trigrams are definitely lower case.
The conundrum becomes cleared up when you consider how trigram indexes are used: they act as a filter that eliminates the majority of non-matches, but allow false positive results (among other reasons is their case insensitivity). That's why there always has to be a recheck to eliminate those false positives, and that us why you always get a bitmap index scan.
The ILIKE
query may be slower because it has more results, or because case insensitive comparisons require more effort.