Search code examples
sqljsonpostgresqlaggregate

Postgres aggregate between JSON columns


I have a Postgres DB with tables for templates and responses, each template defines how many rows and items it has, the response places the items in rows. I want to be able to query a merged response with the "average" placement across all responses for any template.

template_post

id data
1 {"items":[{"id":0},{"id":1},{"id":2},{"id":3}],"rows":[{"id":"0","title":"First row"},{"id":"1","title":"Second row"},{"id":"3","title":"Third row"},{"id":"4","title":"Fourth row"}]}
2 ...

response_post

id template_id [FK] data
1 1 {"filled_rows":[{"row_id":4,"item_ids":[0,1,3]}]}
2 1 {"filled_rows":[{"row_id":2,"item_ids":[2]}]}
3 1 {"filled_rows":[{"row_id":0,"item_ids":[1]},{"row_id":2,"item_ids":[2]},{"row_id":4,"item_ids":[3]}]}

In this case, the query should return the following merged response:

{
  "filled_rows": [
    {
      "row_id": 0,
      "item_ids": [1]
    },
    {
      "row_id": 2,
      "item_ids": [2]
    },
    {
      "row_id": 4,
      "item_ids": [0,3]
    }
  ]
}

In cases of ambiguity, when an item appears the same number of times in more than one row, it should be placed in the first one of those rows.


Solution

  • You need a few levels of aggregation here.

    • Firstly, in a correlated subquery, break out the filled_rows and item_ids into separate rows.
    • Aggregate by item_id and take the min row_id.
    • Aggregate by row_id and take a JSON array of item_ids.
    • Aggregate the whole thing up to get all rows in one JSON
    SELECT
      tp.id,
      r.result
    FROM template_post tp
    CROSS JOIN LATERAL (
        SELECT
          jsonb_build_object(
            'filled_rows', jsonb_agg(byRow.*)
          ) AS result
        FROM (
            SELECT
              byItem.row_id,
              jsonb_agg(byItem.item_id) AS item_ids
            FROM (
                SELECT
                  MIN((r.value ->> 'row_id') ::int) AS row_id,
                  i.value::int AS item_id
                FROM response_post rp
                CROSS JOIN jsonb_array_elements(rp.data -> 'filled_rows') r
                CROSS JOIN jsonb_array_elements_text(r.value -> 'item_ids') i
                WHERE rp.template_id = tp.id
                GROUP BY
                  i.value::int
            ) byItem
            GROUP BY
              byItem.row_id
        ) byRow
    ) r;
    

    db<>fiddle