Search code examples
postgresqlbtree-gist

postgres not using btree_gist index


I have a huge table with a primary key and a btree_gist index. When I query for the columns in the btree_gist index, I would expect that the index is used and the query performs quite fast. However, the optimiser always performs an index scan on the primary key and the filters.

Example:

create table test1 (
    id1 bigint not null,
    id2 bigint not null,
    validtime tstzrange not null,
    data float);
alter table test1 add constraint pk_test1 primary key (id1, id2, validtime);
alter table test1 add constraint ex_test1_validtime exclude using gist (id1 with =, id2 with =, validtime with &&);

The table contains about 1.2 billion rows, the query I wonder about returns only some hundred rows, but takes a long time:

select * from test1 where id1=1 and id2=1 and validtime && '[2020-01-01,2020-02-01)';
(about 3s)

Query plan:

explain select * from test1 where id1=1 and id2=1 and validtime && '[2020-01-01,2020-02-01)';
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Index Scan using pk_test1 on test1  (cost=0.70..24.68 rows=1 width=46)
   Index Cond: ((id1 = 1) AND (id2 = 1))
   Filter: (validtime && '["2020-01-01 00:00:00+00","2020-02-01 00:00:00+00")'::tstzrange)

The reason for the bad performance is obviously that ten thousands of rows are read and filtered in the time criteria.

I wonder why postgres does not use the btree_gist.


I have another, slightly different table where the btree_gist is used but in a very different way than I would expect. This table has about 160 million rows.

create table test2 (
    id1 bigint not null,
    validtime tstzrange not null);                                                                                                                                                                                                                                                                                                                                              
alter table test2 add constraint pk_test2 primary key (id1, validtime);
alter table test2 add constraint ex_test2_validtime exclude using gist (id1 with =, validtime with &&);

Here, the execution plan looks like this:

select * from test2 where id1=1 and validtime && '[2020-01-01,2020-02-01)';
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=1933.19..1937.20 rows=1 width=62)
   Recheck Cond: ((id1 = 1) AND (validtime && '["2020-01-01 00:00:00+00","2020-02-01 00:00:00+00")'::tstzrange))
   ->  BitmapAnd  (cost=1933.19..1933.19 rows=1 width=0)
         ->  Bitmap Index Scan on pk_test2  (cost=0.00..574.20 rows=11417 width=0)
               Index Cond: (id1 = 1)
         ->  Bitmap Index Scan on ex_test2_validtime  (cost=0.00..1358.74 rows=17019 width=0)
               Index Cond: (validtime && '["2020-01-01 00:00:00+00","2020-02-01 00:00:00+00")'::tstzrange)

Why the two bitmap index scans, couldn't it all be done with one index scan using the btree_gist index?


Solution

  • Your answer is correct, but I want to add some background as to why this happens.

    An index in PostgreSQL only supports the operators that belong to the operator family of its operator class. For GiST indexes on bigint, that is

    SELECT ao.amoplefttype::regtype,
           op.oprname,
           ao.amoprighttype::regtype
    FROM pg_opfamily AS of
       JOIN pg_am AS am ON of.opfmethod = am.oid
       JOIN pg_amop AS ao ON of.oid = ao.amopfamily
       JOIN pg_operator AS op ON ao.amopopr = op.oid
    WHERE am.amname = 'gist'
      AND ao.amoplefttype = 'bigint'::regtype;
    
     amoplefttype │ oprname │ amoprighttype 
    ══════════════╪═════════╪═══════════════
     bigint       │ <       │ bigint
     bigint       │ <=      │ bigint
     bigint       │ =       │ bigint
     bigint       │ >=      │ bigint
     bigint       │ >       │ bigint
     bigint       │ <>      │ bigint
     bigint       │ <->     │ bigint
    (7 rows)
    

    which explains why you have to cast to bigint for the index to get used.

    This is surprising if you are used to PostgreSQL, because PostgreSQL does not need such a cast with a B-tree index. The explanation is that the operator family for btree has more operators:

    SELECT ao.amoplefttype::regtype,
           op.oprname,
           ao.amoprighttype::regtype
    FROM pg_opfamily AS of
       JOIN pg_am AS am ON of.opfmethod = am.oid
       JOIN pg_amop AS ao ON of.oid = ao.amopfamily
       JOIN pg_operator AS op ON ao.amopopr = op.oid
    WHERE am.amname = 'btree'
      AND ao.amoplefttype = 'bigint'::regtype;
    
     amoplefttype │ oprname │ amoprighttype 
    ══════════════╪═════════╪═══════════════
     bigint       │ <       │ bigint
     bigint       │ <=      │ bigint
     bigint       │ =       │ bigint
     bigint       │ >=      │ bigint
     bigint       │ >       │ bigint
     bigint       │ <       │ smallint
     bigint       │ <=      │ smallint
     bigint       │ =       │ smallint
     bigint       │ >=      │ smallint
     bigint       │ >       │ smallint
     bigint       │ <       │ integer
     bigint       │ <=      │ integer
     bigint       │ =       │ integer
     bigint       │ >=      │ integer
     bigint       │ >       │ integer
    (15 rows)
    

    and equality comparison between bigint and integer is among them.

    You could have used a regular B-tree index to support your query if you had written the condition using >= and < rather than &&, which would make the cast unnecessary, but of course you don't want to create a second index if there is already the index from the exclusion constraint.