Very different query plans for almost identical queries

I am using PostgreSQL 10 + pg_trgm extension.

Table layout:

    Column    |       Type        | Collation | Nullable |        Default        | Storage  |
--------------+-------------------+-----------+----------+-----------------------+----------+
 id           | integer           |           | not null |                       | plain    | 
 reps         | integer           |           |          | 1                     | plain    | 
 user         | integer           |           |          |                       | plain    | 
 ip           | character varying |           | not null | ''::character varying | extended | 
 visittime    | integer           |           |          |                       | plain    | 
 domain       | character varying |           |          |                       | extended | 
 address      | text              |           |          |                       | extended | 
 method       | character varying |           |          |                       | extended | 
 mime         | character varying |           |          |                       | extended | 
 duration     | integer           |           |          |                       | plain    | 
 size         | bigint            |           |          |                       | plain    | 
 req_status   | integer           |           |          |                       | plain    | 
 http_status  | integer           |           |          |                       | plain    | 
 xproxymeta   | integer           |           |          |                       | plain    | 
Indexes:
    "http_requests_pkey" PRIMARY KEY, btree (id)
    "http_trgm_idx" gin (address gin_trgm_ops)
    "md_userid_idx" btree ("user" DESC)
    "md_visittime_idx" btree (visittime DESC)
Foreign-key constraints:
    "http_requests_user_fkey" FOREIGN KEY ("user") REFERENCES username_id(id)
Triggers:
    add_occupied_space_record_num AFTER INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE add_occupied_space_record_num_func()
    count_repeated_records BEFORE INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE count_repeated_records_func()
    delete_occupied_space_record_num AFTER DELETE ON http_requests FOR EACH ROW EXECUTE PROCEDURE delete_occupied_space_record_num_func()

Please note that there is a GIN Trigram full text index on address column. The table currently has ~10 million records.

Now these two identical queries result in very different plans. The first one takes 56ms and the second one takes ~25 seconds.

# explain analyze select * from http_requests where address ilike '%abc%' order by visittime desc limit 10;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..60.93 rows=10 width=209) (actual time=2.862..21.725 rows=10 loops=1)
   ->  Index Scan using md_visittime_idx on http_requests  (cost=0.43..500074.21 rows=82654 width=209) (actual time=2.861..21.719 rows=10 loops=1)
         Filter: (address ~~* '%abc%'::text)
         Rows Removed by Filter: 6663
 Planning time: 0.279 ms
 Execution time: 21.751 ms
(6 rows)

Now the the same query, with just a different search pattern: xyz

# explain analyze select * from http_requests where address ilike '%xyz%' order by visittime desc limit 10;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5246.47..5246.50 rows=10 width=209) (actual time=23367.849..23367.870 rows=10 loops=1)
   ->  Sort  (cost=5246.47..5248.52 rows=818 width=209) (actual time=23367.846..23367.848 rows=10 loops=1)
         Sort Key: visittime DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Bitmap Heap Scan on http_requests  (cost=2090.34..5228.79 rows=818 width=209) (actual time=17.202..23352.607 rows=18926 loops=1)
               Recheck Cond: (address ~~* '%xyz%'::text)
               Heap Blocks: exact=18243
               ->  Bitmap Index Scan on http_trgm_idx  (cost=0.00..2090.14 rows=818 width=0) (actual time=12.342..12.342 rows=18926 loops=1)
                     Index Cond: (address ~~* '%xyz%'::text)
 Planning time: 0.190 ms
 Execution time: 23368.164 ms
(11 rows)

Why are the plans so different, and how can I fix the slow query?

Solution

PostgreSQL knows that there are many rows that contain the trigram abc, but few that contain the trigram xyz.

So it correctly decides that in the first case, it will pay to use the index that supports the ORDER BY clause and discard rows that don't match the ILIKE condition until 10 result rows are found.

That strategy would not work that well in the second case, because PostgreSQL would have to search a bigger part of the table before finding 10 results, so it decides to use the trigram index and sort the few results.

Unfortunately PostgreSQL's estimate is off (18926 instead of the estimated 818 rows), so the bitmap heap scan takes a long time.

Try 'ANALYZE'ing the table, perhaps with a higher default_statistics_target, to get a better estimate.

If all fails, change the condition to something like

WHERE address || '' ILIKE '%abc%'

to keep PostgreSQL from using the trigram index.