Search code examples
sqlpostgresqlpostgresql-12

PostgreSQL create index on JSONB[]


Consider a table defined as follows:

CREATE TABLE test (
    id int4 NOT NULL,
    tag_counts _jsonb NOT NULL DEFAULT ARRAY[]::jsonb[]
);

INSERT INTO test(id, tag_counts) values(1,array['{"type":1, "count":4}','{"type":2, "count":10}' ]::jsonb[])

How can I create an index on json key type and how can I query on it?

Edit: Previously, there were no indexes on json keys and select queries used an unnest operation as shown below:

select * from (SELECT unnest(tag_counts) as tc
FROM public.test) as t
where tc->'type' = '2';

The problem is, if the table has a large number of rows, the above query will not only include a full table scan, but also filtering through each jsonb array.


Solution

  • There is a way to index this, not sure how fast it will be.

    If that was a "regular" jsonb column, you could use a condition like where tag_counts @> '[{"type": 2}]' which can use a GIN index on the column.

    You can use that operator if you convert the array to "plain" json value:

    select *
    from test
    where to_jsonb(tag_counts) @> '[{"type": 2}]'
    

    Unfortunately, to_jsonb() is not marked as immutable (I guess because of potential timestamp conversion in there) which is a requirement if you want to use an expression in an index.

    But for your data, this is indeed immutable, so we can create a little wrapper function:

    create function as_jsonb(p_input jsonb[])
    returns  jsonb
    as
    $$
      select to_jsonb(p_input);
    $$
    language sql
    immutable;
    

    And with that function we can create an index:

    create index on test using gin ( as_jsonb(tag_counts) jsonb_path_ops);
    

    You will need to use that function in your query:

    select *
    from test
    where as_jsonb(tag_counts) @> '[{"type": 2}]'
    

    On a table with a million rows, I get the following execution plan:

    Bitmap Heap Scan on stuff.test  (cost=1102.62..67028.01 rows=118531 width=252) (actual time=15.145..684.062 rows=147293 loops=1)
      Output: id, tag_counts
      Recheck Cond: (as_jsonb(test.tag_counts) @> '[{"type": 2}]'::jsonb)
      Heap Blocks: exact=25455
      Buffers: shared hit=25486
      ->  Bitmap Index Scan on ix_test  (cost=0.00..1072.99 rows=118531 width=0) (actual time=12.347..12.356 rows=147293 loops=1)
            Index Cond: (as_jsonb(test.tag_counts) @> '[{"type": 2}]'::jsonb)
            Buffers: shared hit=31
    Planning:
      Buffers: shared hit=23
    Planning Time: 0.444 ms
    Execution Time: 690.160 ms