I'm currently optimizing my search results on jsonb fields of PostgreSQL. I'm using Postgres 9.6. My ultimate goal is to search on multiple fields within my jsonb document and rank the results according to their total hits in all fields. But I'm stuck because the ts_rank function doesn't use my index and slowing down the search immensely. Here is a minimal example:
CREATE TABLE book (
id BIGSERIAL NOT NULL,
data JSONB NOT NULL
);
CREATE INDEX book_title_idx
ON book USING GIN (to_tsvector('english', book.data ->> 'title'));
INSERT INTO book (data)
VALUES (CAST('{"title": "Cats"}' AS JSONB));
When trying to search on the title field I'm using this query:
EXPLAIN ANALYZE
SELECT *
FROM (
SELECT
id,
data ->> 'title' AS title,
ts_rank(title_query, 'cat:*') AS score
FROM
book,
to_tsvector('english', data ->> 'title') title_query
WHERE title_query @@ to_tsquery('cat:*')
ORDER BY score DESC) a
WHERE score > 0
ORDER BY score DESC;
Without ranking the search on my real data takes < 1ms, with the ranking it's ~1800ms. It gets worse the more fields I search on. I need the ranking only to make hits in multiple fields more valueable.
Your query gives the plan (on a test dataset with 500000 rows):
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=216058.57..217308.57 rows=500001 width=63) (actual time=831.033..831.033 rows=1 loops=1)
Sort Key: (ts_rank(title_query.title_query, '''cat'':*'::tsquery)) DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.25..149927.55 rows=500001 width=63) (actual time=4.410..830.950 rows=1 loops=1)
-> Seq Scan on book (cost=0.00..8677.01 rows=500001 width=31) (actual time=0.024..30.159 rows=500001 loops=1)
-> Function Scan on to_tsvector title_query (cost=0.25..0.52 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=500001)
Filter: ((ts_rank(title_query, '''cat'':*'::tsquery) > '0'::double precision) AND (title_query @@ to_tsquery('cat:*'::text)))
Rows Removed by Filter: 1
Planning time: 37.211 ms
Execution time: 831.279 ms
(10 rows)
Replace the alias title_query
in WHERE
clause with the expression used in the index definition:
EXPLAIN ANALYZE
SELECT *
FROM (
SELECT
id,
data ->> 'title' AS title,
ts_rank(title_query, 'cat:*') AS score
FROM
book,
to_tsvector('english', data ->> 'title') title_query
WHERE to_tsvector('english', data ->> 'title') @@ to_tsquery('cat:*')
ORDER BY score DESC) a
WHERE score > 0
ORDER BY score DESC;
Sort (cost=9905.39..9930.39 rows=10000 width=63) (actual time=1.069..1.069 rows=1 loops=1)
Sort Key: (ts_rank(title_query.title_query, '''cat'':*'::tsquery)) DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=114.00..9241.00 rows=10000 width=63) (actual time=1.049..1.050 rows=1 loops=1)
-> Bitmap Heap Scan on book (cost=113.75..8940.75 rows=10000 width=31) (actual time=0.052..0.052 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (data ->> 'title'::text)) @@ to_tsquery('cat:*'::text))
Heap Blocks: exact=1
-> Bitmap Index Scan on book_title_idx (cost=0.00..111.25 rows=10000 width=0) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (data ->> 'title'::text)) @@ to_tsquery('cat:*'::text))
-> Function Scan on to_tsvector title_query (cost=0.25..0.27 rows=1 width=32) (actual time=0.994..0.994 rows=1 loops=1)
Filter: (ts_rank(title_query, '''cat'':*'::tsquery) > '0'::double precision)
Planning time: 0.639 ms
Execution time: 1.120 ms
(13 rows)