So basically I set up a very simple test table to test trigram and fulltext indexing capabilities in postgresql 9.1 (stock Debian stable).
Here are the table and index definitions:
-- Table: fulltextproba
-- DROP TABLE fulltextproba;
CREATE TABLE fulltextproba
(
id integer NOT NULL,
text text,
varchar600 character varying(600) COLLATE pg_catalog."C.UTF-8",
CONSTRAINT id PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
-- Index: id_index
-- DROP INDEX id_index;
CREATE UNIQUE INDEX id_index
ON fulltextproba
USING btree
(id );
-- Index: text_gin_fulltext_hun
-- DROP INDEX text_gin_fulltext_hun;
CREATE INDEX text_gin_fulltext_hun
ON fulltextproba
USING gin
(to_tsvector('hungarian'::text, text) );
-- Index: text_gin_trgm
-- DROP INDEX text_gin_trgm;
CREATE INDEX text_gin_trgm
ON fulltextproba
USING gin
(text COLLATE pg_catalog."C.UTF-8" gin_trgm_ops);
-- Index: varchar600
-- DROP INDEX varchar600;
CREATE INDEX varchar600
ON fulltextproba
USING btree
(varchar600 COLLATE pg_catalog."C.UTF-8" varchar_pattern_ops);
-- Index: varchar600_gin_trgm
-- DROP INDEX varchar600_gin_trgm;
CREATE INDEX varchar600_gin_trgm
ON fulltextproba
USING gin
(varchar600 COLLATE pg_catalog."C.UTF-8" gin_trgm_ops);
My problem is, if I do a %foo%
search that should use the trigram index, If I search on the text column, it doesn't:
SELECT COUNT(id) FROM public.fulltextproba WHERE text LIKE '%almáv%'
count
-------
396
(1 row)
real 0m7.215s
user 0m0.020s
sys 0m0.004s
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=657056.11..657056.12 rows=1 width=4)
-> Seq Scan on fulltextproba (cost=0.00..657052.72 rows=1355 width=4)
Filter: (text ~~ '%almáv%'::text)
(3 rows)
But if I search in the varchar600 column, it does use the trigram index, and is - not suprisingly - much faster:
SELECT COUNT(id) FROM public.fulltextproba WHERE varchar600 LIKE '%almáv%'
count
-------
373
(1 row)
real 0m0.184s
user 0m0.052s
sys 0m0.004s
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=5283.11..5283.12 rows=1 width=4)
-> Bitmap Heap Scan on fulltextproba (cost=62.50..5279.73 rows=1355 width=4)
Recheck Cond: ((varchar600)::text ~~ '%almáv%'::text)
-> Bitmap Index Scan on varchar600_gin_trgm (cost=0.00..62.16 rows=1355 width=0)
Index Cond: ((varchar600)::text ~~ '%almáv%'::text)
(5 rows)
So the ultimate questions are:
text
is perfectly fine. The best option even, as you can see in the EXPLAIN
output:
Index Cond: ((varchar600)::text ~~ '%almáv%'::text)
The immediate reason is probably a collation mismatch. Your table is defined:
text text, -- default collation is ???
varchar600 character varying(600) COLLATE pg_catalog."C.UTF-8"
While both indexes use COLLATE pg_catalog."C.UTF-8"
. What is your default collation? The output from:
SHOW LC_COLLATE;
You may be mixing different collations. Retest with:
SELECT COUNT(id) FROM public.fulltextproba
WHERE text COLLATE pg_catalog."C.UTF-8" LIKE '%almáv%'
Read about collation support in Postgres.
You obviously have different values in either of the columns. Repeat the test with identical values.
To force Postgres into using the index, you can (for debugging in your session only!):
SET enable_seqscan = off;
Then try again. Details:
The upcoming Postgres 9.4 is shipping with a number of substantial improvements for GIN indexes. In particular, they are going to be much smaller and faster.