Currently I'm writing queries against a JSONB table with 8 million+ rows. How can I query from the parent
and the friends
objects in the most efficient manner possible?
Query (Postgres 9.6):
select distinct id, data->>'_id' jsonID, data->>'email' email, friends->>'name' friend_name, parent->>'name' parent
from temp t
CROSS JOIN jsonb_array_elements(t.data->'friends') friends
CROSS JOIN jsonb_array_elements(friends->'parent') parent
where friends ->> 'name' = 'Chan Franco'
and parent->>'name' = 'Hannah Golden'
Example DDL (with data): https://pastebin.com/byN7uyKx
Your regularly structured data would be cleaner, smaller and faster as normalized relational design.
That said, to make the setup you have much faster (if not as fast as a normalized design with matching indexes), add a GIN index on the expression data->'friends'
:
CREATE INDEX tbl_data_friends_gin_idx ON tbl USING gin ((data->'friends'));
Then add a matching WHERE
clause to our query with the contains operator @>
:
SELECT DISTINCT -- why DISTINCT ?
id, data->>'_id' AS json_id, data->>'email' AS email
, friends->>'name' AS friend_name, parent->>'name' AS parent
FROM tbl t
CROSS JOIN jsonb_array_elements(t.data->'friends') friends
CROSS JOIN jsonb_array_elements(friends->'parent') parent
WHERE t.data->'friends' @> '[{"name": "Chan Franco", "parent": [{"name": "Hannah Golden"}]}]'
AND friends->>'name' = 'Chan Franco'
AND parent ->>'name' = 'Hannah Golden';
db<>fiddle here
The huge difference: With the help of the index, Postgres can now identify matching rows before unnesting each an every nested "friends" array in the whole table. Only after having identified matching rows in the underlying table, jsonb_array_elements()
is called and resulting rows with qualifying array elements are kept.
Note that the search expression has to be valid JSON, matching the structure of the JSON array data->'friends'
- including the outer brackets []
. But omit all key/value pairs that are not supposed to serve as filter.
Related:
I avoided the table name temp
as this is an SQL key word, that might lead to confusing errors. Using the name tbl
instead.