Search code examples
arrayspostgresqlindexingjsonb

Indexing a jsonb array in Postgres


I try to set up a GIN index but I do not think my index is used when I run the request, whether I use an operator or a function.

Environment

In our table we have a JSONB field (json_aip) containing a Json that looks like that:

{
    "properties": {
        "pdi": {
            "contextInformation": {
                "tags": ["SOME_TAG"]
            },
    },
}

Table creation :

create table t_aip (
    json_aip jsonb,
    [...]
);

CREATE INDEX idx_aip_tags 
ON t_aip 
USING gin ((json_aip -> 'properties' -> 'pdi' -> 'contextInformation' -> 'tags'));

Operator query

We can't use the operator ?| as we use JDBC. But rumors indicate I should see my index when I run that type of query.

EXPLAIN ANALYZE SELECT count(*)  
FROM storage.t_aip 
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']

Result:

  Aggregate

  (cost=27052.16..27052.17 rows=1 width=8) (actual time=488.085..488.087 rows=1 loops=1)
  ->  Seq Scan on t_aip  (cost=0.00..27052.06 rows=42 width=0) (actual time=0.134..456.978 rows=16502 loops=1)
        Filter: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
        Rows Removed by Filter: 17511
Planning time: 23.202 ms
Execution

time: 488.449 ms

Functional query

EXPLAIN ANALYZE SELECT count(*)  
FROM storage.t_aip 
WHERE jsonb_exists_any(
    json_aip#>'{properties,pdi,contextInformation,tags}', 
    array['SOME_TAG']
)

Result:

QUERY PLAN
Aggregate  (cost=27087.00..27087.01 rows=1 width=8) (actual time=369.931..369.933 rows=1 loops=1)
  ->  Seq Scan on t_aip  (cost=0.00..27052.06 rows=13979 width=0) (actual time=0.173..350.437 rows=16502 loops=1)
        Filter: jsonb_exists_any((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]), '{SOME_TAG}'::text[])
        Rows Removed by Filter: 17511
Planning time: 56.021 ms
Execution time: 370.252 ms

There is nothing about the index at all. Any help would be much appreciated !

I think my index is wrong because it considers that at the end of the path json_aip -> 'properties' -> 'pdi' -> 'contextInformation' -> 'tags' it index a String whether that's an array. That's my opinion.


Solution

  • There is a general rule that you have to use exactly the same expression both in an index and a query to use the index. With this index:

    CREATE INDEX idx_aip_tags 
    ON t_aip 
    USING gin ((json_aip#>'{properties,pdi,contextInformation,tags}'));
    

    the query will use the index

    EXPLAIN ANALYZE 
    SELECT count(*)  
    FROM t_aip 
    WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']
    
                                                               QUERY PLAN                                                            
    ---------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=149.97..149.98 rows=1 width=0) (actual time=27.783..27.783 rows=1 loops=1)
       ->  Bitmap Heap Scan on t_aip  (cost=20.31..149.87 rows=40 width=0) (actual time=1.504..25.726 rows=20000 loops=1)
             Recheck Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
             Heap Blocks: exact=345
             ->  Bitmap Index Scan on idx_aip_tags  (cost=0.00..20.30 rows=40 width=0) (actual time=1.455..1.455 rows=20000 loops=1)
                   Index Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
    

    Note that the GIN index supports also @> operator:

    SELECT count(*)  
    FROM t_aip 
    WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG"]'
    

    but be careful when searching for more than one tag:

    SELECT count(*)  
    FROM t_aip 
    -- this gives objects containing both tags:
    -- WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG", "ANOTHER_TAG"]'
    -- and this gives objects with any of two tags:
    WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> ANY(ARRAY['["SOME_TAG"]', '["ANOTHER_TAG"]']::jsonb[])