Search code examples
postgresqljsonbset-returning-functions

Is it safe to use jsonb_array_elements() twice in the same SELECT list?


Is it safe to select multiple columns with jsonb_array_elements in a single SELECT statement? Is there a guarantee that the order of the expanded elements in one column is the same as the order in the second column?

Example: My table contains a column data that contains a json array. Each element of the array is an object with two properties (id and name):

                                data                                
--------------------------------------------------------------------
 [{"id": "11", "name": "entry11"}, {"id": "12", "name": "entry12"}]
 [{"id": "21", "name": "entry21"}, {"id": "22", "name": "entry22"}]

If I run

with my_table(data) as (values
    ('[{"id": "11", "name":"entry11"},{"id": "12", "name": "entry12"}]'::jsonb),
    ('[{"id": "21", "name":"entry21"},{"id": "22", "name": "entry22"}]'::jsonb)
)
select 
  jsonb_array_elements(data)->'id' as id,
  jsonb_array_elements(data)->'name' as name
from my_table
;

I get the expected result:

  id  |   name    
------+-----------
 "11" | "entry11"
 "12" | "entry12"
 "21" | "entry21"
 "22" | "entry22"

My Question: Is there a risk that the name entry22 could end up in the row with id 21 as the two invocations of jsonb_array_elements are handled independently by the database? My experiments (also with larger tables) suggest that it always works. But as relational databases usually don't have a stable ordering of rows I wonder if I can rely on that result.


Solution

  • Yes, there is a guarantee. But it's a weak one.

    id and name will stay in sync (come from the same nested object) because the two set-returning functions are evaluated in "lockstep". The manual:

    If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions into a single LATERAL ROWS FROM( ... ) FROM-clause item. For each row from the underlying query, there is an output row using the first result from each function, then an output row using the second result, and so on. If some of the set-returning functions produce fewer outputs than others, null values are substituted for the missing data, so that the total number of rows emitted for one underlying row is the same as for the set-returning function that produced the most outputs. Thus the set-returning functions run “in lockstep” until they are all exhausted, and then execution continues with the next underlying row.

    Bold emphasis mine.

    Since both your calls are guaranteed to return the same number of rows, this even works reliably before Postgres 10 (where the behavior of multiple set-returning functions in the SELECT list was reformed (sanitized). See:

    Unlike json, jsonb has a deterministic sort order of nested elements. But the only relevant aspect here is the order of array items, and that is always significant and preserved in json and jsonb alike.

    Optimize query

    Elaborating on what we just learned about the internal workings, your query:

    SELECT jsonb_array_elements(data) -> 'id'   AS id
         , jsonb_array_elements(data) -> 'name' AS name
    FROM   tbl;
    

    ... is equivalent to:

    SELECT t.id, x.i->'id' AS id, x.n->'name' AS name
    FROM   tbl t
    CROSS  JOIN LATERAL ROWS FROM (jsonb_array_elements(t.data), jsonb_array_elements(t.data)) x(i, n);
    

    Makes it even more obvious, that it should be simplified to a single function call in a subquery. Then fields cannot get out of sync to begin with:

    SELECT t.id, x->'id' AS id, x->'name' AS name
    FROM   tbl t
    CROSS  JOIN LATERAL jsonb_array_elements(t.data) x;
    

    Or, with minimal syntax:

    SELECT id, x->'id' AS id, x->'name' AS name
    FROM   tbl, jsonb_array_elements(data) x;
    

    fiddle

    When returning multiple, separate fields from the same function call, move the function call to a subquery as demonstrated. Else you risk (costly) repeated evaluation. See:

    True issue with (multiple) SRF in the SELECT list

    All of the above harbor a trap. If a SRF in the SELECT list does not return a row (like when jsonb_array_elements() finds an empty array), the row is removed from the result. When there are multiple SRF in the SELECT list, the row is removed when all of them come up empty. This can lead to surprising results. From my experience, few are aware of the subtle implications.

    If that side effect is intended, it's much clearer spelled out as CROSS JOIN as demonstrated above.

    If that side effect is not intended, and you'd rather preserve all rows, use LEFT JOIN LATERAL ... ON true instead:

    SELECT t.id, x->'id' AS id, x->'name' AS name
    FROM   tbl t
    LEFT   JOIN LATERAL jsonb_array_elements(t.data) x ON true;

    fiddle

    Use this query, unless you know better. See:

    Aside: jsonb_populate_recordset() for this particular query

    There is a more efficient way for your particular query with jsonb_populate_recordset(): faster, and with implicit type casts.

    Create a fitting composite type once if you don't have one:

    CREATE TYPE my_rowtype AS (id int, name text);
    

    Then:

    SELECT t.id, x.id, x.name
    FROM   tbl t
    LEFT   JOIN LATERAL jsonb_populate_recordset(null::my_rowtype, t.data) x ON true
    

    fiddle

    Related: