My results returned in PostgreSQL full-text search are very slow, typically 5-25 seconds for a query.
My query is as follows:
SELECT tp.*
FROM diff_profile tp
JOIN diff_profile_full_text_document AS full_text_document ON tp.id = full_text_document.profile_id
WHERE to_tsquery( 'social & media' ) @@ (full_text_document.rankings) AND (ts_rank_cd(full_text_document.rankings, to_tsquery( 'social & media' )) > 0 )
ORDER BY ts_rank_cd(full_text_document.rankings, to_tsquery('social & media')) DESC
LIMIT 1000
EXPLAIN ANALYZE:
"Limit (cost=149011.43..149128.11 rows=1000 width=458) (actual time=25698.214..25699.022 rows=1000 loops=1)"
" Buffers: shared hit=2016881 read=632773"
" -> Gather Merge (cost=149011.43..149960.47 rows=8134 width=458) (actual time=25698.210..25759.028 rows=1000 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=5903977 read=1845055"
" -> Sort (cost=148011.41..148021.58 rows=4067 width=458) (actual time=25688.667..25689.008 rows=479 loops=3)"
" Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
" Sort Method: top-N heapsort Memory: 568kB"
" Worker 0: Sort Method: top-N heapsort Memory: 572kB"
" Worker 1: Sort Method: top-N heapsort Memory: 553kB"
" Buffers: shared hit=5903977 read=1845055"
" -> Nested Loop (cost=322.87..147788.42 rows=4067 width=458) (actual time=331.940..25394.498 rows=202972 loops=3)"
" Buffers: shared hit=5903961 read=1845055"
" -> Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document (cost=322.31..112574.79 rows=4067 width=340) (actual time=331.745..19330.030 rows=202972 loops=3)"
" Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
" Rows Removed by Index Recheck: 3043633"
" Filter: (ts_rank_cd(rankings, to_tsquery('social & media'::text)) > '0'::double precision)"
" Heap Blocks: exact=19968 lossy=159568"
" Buffers: shared hit=2583041 read=1086064"
" -> Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index (cost=0.00..319.87 rows=29282 width=0) (actual time=324.090..324.090 rows=608915 loops=1)"
" Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
" Buffers: shared hit=293 read=541"
" -> Index Scan using output_profile_pkey2 on diff_profile tp (cost=0.56..8.41 rows=1 width=454) (actual time=0.015..0.015 rows=1 loops=608915)"
" Index Cond: (id = full_text_document.profile_id)"
" Buffers: shared hit=2287324 read=758991"
"Planning Time: 1.055 ms"
"Execution Time: 25759.553 ms"
Updated with new query and increased work_mem
"Nested Loop (cost=114825.29..123517.90 rows=1000 width=454) (actual time=8006.457..8020.363 rows=1000 loops=1)"
" Buffers: shared hit=617083 read=337225"
" -> Limit (cost=114824.72..114941.40 rows=1000 width=12) (actual time=8006.399..8006.852 rows=1000 loops=1)"
" Buffers: shared hit=614368 read=334936"
" -> Gather Merge (cost=114824.72..115773.76 rows=8134 width=12) (actual time=8006.397..8031.989 rows=1000 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=1683284 read=909769"
" -> Sort (cost=113824.70..113834.87 rows=4067 width=12) (actual time=7997.708..7997.805 rows=796 loops=3)"
" Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
" Sort Method: top-N heapsort Memory: 123kB"
" Worker 0: Sort Method: top-N heapsort Memory: 123kB"
" Worker 1: Sort Method: top-N heapsort Memory: 123kB"
" Buffers: shared hit=1683284 read=909769"
" -> Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document (cost=322.31..113601.71 rows=4067 width=12) (actual time=475.461..7884.962 rows=202972 loops=3)"
" Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
" Filter: (ts_rank_cd(rankings, to_tsquery('social & media'::text)) > '0'::double precision)"
" Heap Blocks: exact=191318"
" Buffers: shared hit=1683272 read=909765"
" -> Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index (cost=0.00..319.87 rows=29282 width=0) (actual time=339.534..339.534 rows=608915 loops=1)"
" Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
" Buffers: shared hit=293 read=541"
" -> Index Scan using output_profile_pkey2 on diff_profile tp (cost=0.56..8.57 rows=1 width=454) (actual time=0.013..0.013 rows=1 loops=1000)"
" Index Cond: (id = full_text_document.profile_id)"
" Buffers: shared hit=2715 read=2289"
"Planning Time: 1.094 ms"
"Execution Time: 8045.787 ms"
Updated with IO timings (64M work_mem and new query)
"Nested Loop (cost=120482.74..129175.35 rows=1000 width=454) (actual time=10603.522..10619.940 rows=1000 loops=1)"
" Buffers: shared hit=857647 read=306515"
" I/O Timings: read=2124.818"
" -> Limit (cost=120482.18..120598.85 rows=1000 width=16) (actual time=10603.405..10603.921 rows=1000 loops=1)"
" Buffers: shared hit=854932 read=304227"
" I/O Timings: read=2113.927"
" -> Gather Merge (cost=120482.18..122063.82 rows=13556 width=16) (actual time=10603.400..10705.650 rows=1000 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=2563341 read=909745"
" I/O Timings: read=6413.484"
" -> Sort (cost=119482.15..119499.10 rows=6778 width=16) (actual time=10595.426..10595.479 rows=786 loops=3)"
" Sort Key: (ts_rank_cd(full_text_document.rankings, to_tsquery('social <-> media'::text))) DESC, (ts_rank_cd(full_text_document.rankings, to_tsquery('social & media'::text))) DESC"
" Sort Method: top-N heapsort Memory: 122kB"
" Worker 0: Sort Method: top-N heapsort Memory: 123kB"
" Worker 1: Sort Method: top-N heapsort Memory: 122kB"
" Buffers: shared hit=2563341 read=909745"
" I/O Timings: read=6413.484"
" -> Parallel Bitmap Heap Scan on diff_profile_full_text_document full_text_document (cost=323.93..119110.52 rows=6778 width=16) (actual time=463.799..10479.477 rows=188245 loops=3)"
" Recheck Cond: (to_tsquery('social & media'::text) @@ rankings)"
" Filter: ((ts_rank_cd(rankings, to_tsquery('social <-> media'::text)) > '0'::double precision) OR (ts_rank_cd(rankings, to_tsquery('social & media'::text)) >= '0.2'::double precision))"
" Rows Removed by Filter: 14727"
" Heap Blocks: exact=175093"
" Buffers: shared hit=2563325 read=909745"
" I/O Timings: read=6413.484"
" -> Bitmap Index Scan on diff_profile_full_text_document_gin_ranking_index (cost=0.00..319.87 rows=29282 width=0) (actual time=326.170..326.170 rows=608915 loops=1)"
" Index Cond: (to_tsquery('social & media'::text) @@ rankings)"
" Buffers: shared hit=293 read=541"
" I/O Timings: read=6.623"
" -> Index Scan using output_profile_pkey2 on diff_profile tp (cost=0.56..8.57 rows=1 width=454) (actual time=0.015..0.015 rows=1 loops=1000)"
" Index Cond: (id = full_text_document.profile_id)"
" Buffers: shared hit=2715 read=2288"
" I/O Timings: read=10.891"
"Planning Time: 7.341 ms"
"Execution Time: 10722.398 ms"
I would appreciate any help on this. Thanks in advance.
It looks like every row in "full_text_document" resolves to exactly one row in "tp". If that is known to always be the case, then since the WHERE and the ORDER BY don't depend on "tp" at all, you could do the LIMIT first and then join.
SELECT tp.*
FROM diff_profile tp
JOIN (
SELECT profile_id from diff_profile_full_text_document AS full_text_document
WHERE to_tsquery( 'coffee' ) @@ (full_text_document.rankings) AND (ts_rank_cd(full_text_document.rankings, to_tsquery( 'coffee' )) > 0 )
ORDER BY ts_rank_cd(full_text_document.rankings, to_tsquery('coffee')) DESC
LIMIT 1000
) ftd ON tp.id = ftd.profile_id
It looks like that would shave off about half the time. To go much beyond that, you probably need to make your queries more selective. Retrieving and ordering hundreds of thousands of rows just to return 1000 of them is never going to be very efficient.