Search code examples
arrayspostgresqldatabase-designjsonbpostgresql-performance

Most efficient way to query data nested deep in JSON arrays?


Currently I'm writing queries against a JSONB table with 8 million+ rows. How can I query from the parent and the friends objects in the most efficient manner possible?

Query (Postgres 9.6):

select distinct id, data->>'_id' jsonID, data->>'email' email, friends->>'name' friend_name, parent->>'name' parent
from temp t
CROSS JOIN jsonb_array_elements(t.data->'friends') friends
CROSS JOIN jsonb_array_elements(friends->'parent') parent
where friends ->> 'name' = 'Chan Franco'
and parent->>'name' = 'Hannah Golden'

Example DDL (with data): https://pastebin.com/byN7uyKx


Solution

  • Your regularly structured data would be cleaner, smaller and faster as normalized relational design.

    That said, to make the setup you have much faster (if not as fast as a normalized design with matching indexes), add a GIN index on the expression data->'friends':

     CREATE INDEX tbl_data_friends_gin_idx ON tbl USING gin ((data->'friends'));
    

    Then add a matching WHERE clause to our query with the contains operator @>:

    
    SELECT DISTINCT  -- why DISTINCT ?
           id, data->>'_id' AS json_id, data->>'email' AS email
         , friends->>'name' AS friend_name, parent->>'name' AS parent
    FROM   tbl t
    CROSS  JOIN jsonb_array_elements(t.data->'friends') friends
    CROSS  JOIN jsonb_array_elements(friends->'parent') parent
    WHERE  t.data->'friends' @> '[{"name": "Chan Franco", "parent": [{"name": "Hannah Golden"}]}]'
    AND    friends->>'name' = 'Chan Franco'
    AND    parent ->>'name' = 'Hannah Golden';
    

    db<>fiddle here

    The huge difference: With the help of the index, Postgres can now identify matching rows before unnesting each an every nested "friends" array in the whole table. Only after having identified matching rows in the underlying table, jsonb_array_elements() is called and resulting rows with qualifying array elements are kept.

    Note that the search expression has to be valid JSON, matching the structure of the JSON array data->'friends' - including the outer brackets []. But omit all key/value pairs that are not supposed to serve as filter.

    Related:

    I avoided the table name temp as this is an SQL key word, that might lead to confusing errors. Using the name tbl instead.