Search code examples
postgresqljsonbgenerated-columnspostgresql-16

In Postgresql is there a way to flatten or aggregate nested JSONB arrays that can be used with generated stored columns?


So imagine I have a table with a jsonb column data (for the sake of example, not because it's a good column name) that contains an array of json objects with a key states that is an array of USA state abbreviations:

[ { states: ['AZ', 'CA'], ... }, { states: ['NY','CO'], ... }, ... ]

I'd like to aggregate all the states in the JSON objects into a list of distinct states and make it into a generated stored column states. (e.g. ['AZ', 'CA', 'NY', 'CO', ...])

One not great way to select such a list is:

select jsonb_path_query_array(
  ( select jsonb_agg(b) from
    (select distinct jsonb_array_elements(a) as state from jsonb_array_elements(
       jsonb_path_query_array(data, '$[*].states')
    ) as a) as b
  ), '$[*].state') as states
from myTable

This (a) seems overly complicated for something as simple as mapping an array by the states key and calling whatever the PG equivalent of flatten is, and (b) has select in it, and so can't be used for generated columns.

The other ways I can imagine doing it all involve lateral joins.

Given that all the data is in a column, it seems like it should be possible to do some kind of transformation on it without selects or joins for creating a generated stored column. But I can't seem to figure out how to do this without using select and/or join lateral.

Is there a way to do this for a generated stored column in PostgreSQL?

(Currently on PG 16.x)


Solution

  • This seems overly complicated for something as simple as mapping an array by the states key and calling whatever the PG equivalent of flatten is

    It is. You don't need jsonb_path_query_array, just use jsonb_path_query which returns setof jsonb and a path that leads to each string itself:

    SELECT jsonb_agg(DISTINCT state) FROM jsonb_path_query(data, '$[*].states[*]')
    

    (online demo)

    If you prefer jsonb_array_elements over json paths, it would be a lateral function call:

    SELECT jsonb_agg(DISTINCT state)
    FROM jsonb_array_elements(data) obj,
         jsonb_array_elements(obj->'states') state
    

    (online demo)

    This has select in it, and so can't be used for generated columns.

    I think that is unavoidable (at least if you need the distinct - otherwise jsonb_path_query_array(data, '$[*].states[*]') would do). However you can always write a custom function, do the json manipulation inside, mark it as immutable, and use it for your generated column.