Search code examples
postgresqlindexingjsonb

PostgreSQL GIN index on nested objects in JSONB not working


Disclaimer: I read every "similar question" but still can't get it to work.

I have the following table:

CREATE TABLE test (
id SERIAL PRIMARY KEY,
content JSONB
);

Generating test data

INSERT INTO test(content)
select ('[{"uId": "'|| g.value ||'", "data": "'|| g.value ||'"},{"uId": "'|| g.value +1 ||'","data": "'|| g.value +1 || '"},{"uId": "'|| g.value +2 ||'","data": "'|| g.value +2 ||'"}]')::jsonb
from generate_series(1,100000) as g(value)
);

Which gives me content that looks like

[{"uId": "1", "data": "1"}, {"uId": "2", "data": "2"}, {"uId": "3", "data": "3"}]

It's an JSON array of objects. Each having a uId field.

I want to query that array and get an object by it's id. For that I am currently using:

SELECT id,elem
FROM test,
LATERAL jsonb_array_elements(content) AS elem
WHERE elem @> '{"uId": "1"}';

Which does indeed return the desired result:

1 {"uId": "1", "data": "1"}

The query is unfortunately very slow. In production I have in total about 13Mio objects across ~22.000 rows. The query takes about one minute there.

I tried every possible way to define an index but it never get's picked up. Even after

Analyse test;
SET enable_seqscan = off; #just for tests of course 

I tried

CREATE INDEX ON test USING GIN (content);
CREATE INDEX ON test USING GIN (content jsonb_path_ops);
CREATE INDEX ON test USING GIN (
  (jsonb_path_query_array(content, '$[*].uId'))
);
CREATE INDEX ON test USING btree ((content ->> 'uId'));

And many, many different select statements. They either don't return the correct result or don't use the index.

Any help is very welcome.


Solution

  • The GIN index works fine with:

    CREATE INDEX ON test USING GIN (content);
    

    The problem is the lateral join on jsonb_array_elements which prevents the index from being used. If you need to resultset in exactly that format you could first reduce the table with a CTE using a query that is actually able to use the index. I came up with this:

    EXPLAIN ANALYSE
    WITH rec AS (
      SELECT * FROM test WHERE content @> '[{"uId": "5"}]'
    )
    SELECT * FROM rec,
    LATERAL jsonb_array_elements(content) AS elem
    WHERE elem @> '{"uId": "5"}';
    

    Which will first filter the table with SELECT * FROM test WHERE content @> '[{"uId": "5"}]' which uses the index without issue. With the example data, the lateral join then only has to process three rows which can be done sequentially.

    See here for an interactive example with query plans: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/14133