Search code examples
jsonpostgresqlpostgresql-14

PostgreSQL: Efficiently aggregate a list query result into JSON object


I need to return a result from a query in which I match one row from a table, then aggregate in JSON from up to 500,000 text values that reference that row.

  • The JSON must be a map where the names are data points. It needs to be a key:value object, since then it will be serialized into a hash map in my golang backend, which will allow me to search it in constant time. I don't care what the values are empty strings or null are preferred to reduce transport and serialization time.

Initially I tried using jsonb_build_object but it was very slow. Here's a simplified version of the query:

SELECT
    table_a.id,
    table_a.name,
    table_a.description,
    table_a.created,
    table_a.createdby,
    table_a.modified,
    table_a.modifiedby,
    jsonb_build_object( -- the aggregate takes a very long time
        SELECT
            table_b.item
        FROM
            table_b
        WHERE
            table_a.id = table_b.table_a_id
    )
FROM
    table_a
WHERE 
    table_a.id = <some_id_input>;

My next step was to get an array, then convert it to JSON. This proved much more efficient:

SELECT
    table_a.id,
    table_a.name,
    table_a.description,
    table_a.created,
    table_a.createdby,
    table_a.modified,
    table_a.modifiedby,
    to_jsonb( ARRAY (
        SELECT
            table_b.item
        FROM
            table_b
        WHERE
            table_a.id = table_b.table_a_id
    ))
FROM
    table_a
WHERE 
    table_a.id = <some_id_input>;

Even though it's more efficient, it gives me a JSON array when I need a JSON object...

Is there an easy and efficient way in PostgreSQL 14 to produce what I need?


Solution

  • Join to the aggregated result:

    SELECT
        table_a.id,
        table_a.name,
        table_a.description,
        table_a.created,
        table_a.createdby,
        table_a.modified,
        table_a.modifiedby,
        b.items
    FROM table_a
      LEFT JOIN (
        select table_a_id, jsonb_agg(table_b.item) as items
        FROM table_b
        GROUP by table_a_id
      ) b on b.table_a_id = table_a.id
    WHERE 
        table_a.id = <some_id_input>;
    

    Or maybe using a lateral join as you restrict this to a single table_a.id:

    SELECT
        table_a.id,
        table_a.name,
        table_a.description,
        table_a.created,
        table_a.createdby,
        table_a.modified,
        table_a.modifiedby,
        b.items
    FROM table_a
      LEFT JOIN LATERAL (
        select jsonb_agg(table_b.item) as items
        FROM table_b
        WHERE table_b.table_a_id = table_a.id
      ) b on true 
    WHERE 
        table_a.id = <some_id_input>;
    

    I am not sure, but I wouldn't be surprised if using a native array (array_agg()) is more efficient than JSON to create an array of 500.000 items