Search code examples
postgresqlindexingjsonbpostgresql-9.6

Postgres 9.6 - jsonb ?| operator missing index, but present with ? operator


I'm making some queries in postgres 9,6 involving a jsonb column, an int column, two indexes (one for each column) and the ?| operator. The jsonb part of the query is using its index just fine, but the int column is not.

The weird thing is that both indexes work as expected when splitting the ?| condition into multiple or ? conditions.

Here are the specifics of what I'm working with.

  • int index

    • CREATE INDEX team_id_index ON revisions (team_id);
  • jsonb index

    • CREATE INDEX _group_ids_gin_index ON revisions USING GIN(_group_ids jsonb_ops);

half-working ?| query

select * 
from revisions 
where team_id = 1 
and _group_ids ?| '{"0","91"}';
Bitmap Heap Scan on revisions  (cost=224.42..2214.66 rows=92 width=992) (actual time=7.783..40.178 rows=4454 loops=1)
  Recheck Cond: (_group_ids ?| '{0,91}'::text[])
  Filter: (team_id = 1)
  Rows Removed by Filter: 63027
  Heap Blocks: exact=5129
  ->  Bitmap Index Scan on _group_ids_gin_index  (cost=0.00..224.40 rows=587 width=0) (actual time=7.086..7.086 rows=67481 loops=1)
        Index Cond: (_group_ids ?| '{0,91}'::text[])
Planning time: 0.142 ms
Execution time: 40.401 ms

fully-working ? query

select * 
from revisions 
where team_id = 1 
and (_group_ids ? '0' or _group_ids ? '91');
Bitmap Heap Scan on revisions  (cost=2414.55..3091.44 rows=184 width=992) (actual time=12.965..16.162 rows=4454 loops=1)
  Recheck Cond: (((_group_ids ? '0'::text) OR (_group_ids ? '91'::text)) AND (team_id = 1))
  Heap Blocks: exact=818
  ->  BitmapAnd  (cost=2414.55..2414.55 rows=184 width=0) (actual time=12.844..12.844 rows=0 loops=1)
        ->  BitmapOr  (cost=424.89..424.89 rows=1173 width=0) (actual time=7.329..7.329 rows=0 loops=1)
              ->  Bitmap Index Scan on _group_ids_gin_index  (cost=0.00..212.40 rows=587 width=0) (actual time=6.439..6.439 rows=67076 loops=1)
                    Index Cond: (_group_ids ? '0'::text)
              ->  Bitmap Index Scan on _group_ids_gin_index  (cost=0.00..212.40 rows=587 width=0) (actual time=0.887..0.887 rows=405 loops=1)
                    Index Cond: (_group_ids ? '91'::text)
        ->  Bitmap Index Scan on team_id_index  (cost=0.00..1989.36 rows=91858 width=0) (actual time=5.218..5.218 rows=90229 loops=1)
              Index Cond: (team_id = 1)
Planning time: 0.154 ms
Execution time: 16.540 ms

It's not a big deal if I just need to rewrite all my queries using ? instead of ?|, but the mystery of WHY it's doing this is driving me crazy. Please help for my sanity!


Solution

  • The plans are different because the estimates are different. Selectivity of ?| is estimated without looking to see how long the list on the RHS is, while the Or planning does implicitly take this into account.

    Bitmap Index Scan on _group_ids_gin_index ... rows=587

    BitmapOr ... rows=1173

    The smaller you think the bitmap is going to be to start with, the less sense it makes to try to make it even smaller by scanning a separate index to add in a BitmapAnd.

    Planning in this area has not improved since 9.6, and you don't have a lot of great options other than rewriting the queries as you already indicate.

    Note that the estimates are way off in either case, it is just that one way of being way off accidentally leads to better plans. If you used native arrays rather than encapsulating them in JSONB, it would probably have better estimates, so could probably come up with better plans more reliably than on accident.