Search code examples
sqlpostgresqlindexingquery-optimizationpostgresql-performance

Prevent usage of index for a particular query in Postgres


I have a slow query in a Postgres DB. Using explain analyze, I can see that Postgres makes bitmap index scan on two different indexes followed by bitmap AND on the two resulting sets.

Deleting one of the indexes makes the evaluation ten times faster (bitmap index scan is still used on the first index). However, that deleted index is useful in other queries.

Query:

select
  booking_id
from
  booking
where
  substitute_confirmation_token is null
  and date_trunc('day', from_time) >= cast('01/25/2016 14:23:00.004' as date)
  and from_time >= '01/25/2016 14:23:00.004'
  and type = 'LESSON_SUBSTITUTE'
  and valid
order by
  booking_id;

Indexes:

"idx_booking_lesson_substitute_day" btree (date_trunc('day'::text, from_time)) WHERE valid AND type::text = 'LESSON_SUBSTITUTE'::text
"booking_substitute_confirmation_token_key" UNIQUE CONSTRAINT, btree (substitute_confirmation_token)

Query plan:

Sort  (cost=287.26..287.26 rows=1 width=8) (actual time=711.371..711.377 rows=44 loops=1)
  Sort Key: booking_id
  Sort Method: quicksort  Memory: 27kB
  Buffers: shared hit=8 read=7437 written=1
  ->  Bitmap Heap Scan on booking  (cost=275.25..287.25 rows=1 width=8) (actual time=711.255..711.294 rows=44 loops=1)
        Recheck Cond: ((date_trunc('day'::text, from_time) >= '2016-01-25'::date) AND valid AND ((type)::text = 'LESSON_SUBSTITUTE'::text) AND (substitute_confirmation_token IS NULL))
        Filter: (from_time >= '2016-01-25 14:23:00.004'::timestamp without time zone)
        Buffers: shared hit=5 read=7437 written=1
        ->  BitmapAnd  (cost=275.25..275.25 rows=3 width=0) (actual time=711.224..711.224 rows=0 loops=1)
              Buffers: shared hit=5 read=7433 written=1
              ->  Bitmap Index Scan on idx_booking_lesson_substitute_day  (cost=0.00..20.50 rows=594 width=0) (actual time=0.080..0.080 rows=72 loops=1)
                    Index Cond: (date_trunc('day'::text, from_time) >= '2016-01-25'::date)
                    Buffers: shared hit=5 read=1
              ->  Bitmap Index Scan on booking_substitute_confirmation_token_key  (cost=0.00..254.50 rows=13594 width=0) (actual time=711.102..711.102 rows=2718734 loops=1)
                    Index Cond: (substitute_confirmation_token IS NULL)
                    Buffers: shared read=7432 written=1
Total runtime: 711.436 ms

Can I prevent using a particular index for a particular query in Postgres?


Solution

  • Your clever solution

    For your particular case, a partial unique index that only covers rare values, so Postgres won't (can't) use the index for the common NULL value.

    CREATE UNIQUE INDEX booking_substitute_confirmation_uni
    ON booking (substitute_confirmation_token)
    WHERE substitute_confirmation_token IS NOT NULL;
    

    It's a textbook use-case for a partial index. Literally! The manual has a similar example and this perfectly matching advice to go with it:

    Finally, a partial index can also be used to override the system's query plan choices. Also, data sets with peculiar distributions might cause the system to use an index when it really should not. In that case the index can be set up so that it is not available for the offending query. Normally, PostgreSQL makes reasonable choices about index usage (e.g., it avoids them when retrieving common values, so the earlier example really only saves index size, it is not required to avoid index usage), and grossly incorrect plan choices are cause for a bug report.

    Keep in mind that setting up a partial index indicates that you know at least as much as the query planner knows, in particular you know when an index might be profitable. Forming this knowledge requires experience and understanding of how indexes in PostgreSQL work. In most cases, the advantage of a partial index over a regular index will be minimal. There are cases where they are quite counterproductive [...]

    You commented:

    The table has few millions of rows and just few thousands of rows with not null values.

    So it's a perfect use-case. It will even speed up queries on non-null values for substitute_confirmation_token because the index is much smaller now.

    Answer to question

    To answer your original question: it's not possible to "disable" an existing index for a particular query. You would have to drop it, but that's way too expensive.

    Fake drop index

    You could drop an index inside a transaction, run your SELECT and then, instead of committing, use ROLLBACK. That's fast, but be aware that (quoting the manual):

    A normal DROP INDEX acquires exclusive lock on the table, blocking other accesses until the index drop can be completed.

    So this is no good for regular use in multi-user environments.

    BEGIN;
    DROP INDEX big_user_id_created_at_idx;
    SELECT ...;
    ROLLBACK;  -- so the index is preserved after all
    

    See:

    More detailed statistics

    Normally, though, it should be enough to raise the STATISTICS target for the column, so Postgres can more reliably identify common values and avoid the index for those. Try:

    ALTER TABLE booking ALTER COLUMN substitute_confirmation_token SET STATISTICS 1000;
    

    Then: ANALYZE booking; before you try your query again. 1000 is an example value. Related: