Search code examples
sqlarrayspostgresqlaggregate-functions

PostgreSQL ERROR: cannot accumulate arrays of different dimensionality


I have a Postgres query I'm trying to aggregate an array of arrays with different dimensions, a simplified form of the query can be written as:

SELECT n, ARRAY(SELECT UNNEST(a) FROM (SELECT ARRAY_AGG(a) AS a) a) AS a, SUM(q) AS q
FROM (
    SELECT 'foo' AS n, '{1,2}'::integer[] AS a, 1 AS q
    UNION ALL
    SELECT 'foo' AS n, '{3,4,5}'::integer[] AS a, 1 AS q
) results
GROUP BY n

This results in an error:

Query 1 ERROR at Line 1:
ERROR: cannot accumulate arrays of different dimensionality

The result I'm hoping for is:

n a q
'foo' {1,2,3,4,5} 2

However, if the arrays have the same dimensions it works fine, eg:

SELECT n, ARRAY(SELECT UNNEST(a) FROM (SELECT ARRAY_AGG(a) AS a) a) AS a, SUM(q) AS q
FROM (
    SELECT 'foo' AS n, '{1,2,3}'::integer[] AS a, 1 AS q
    UNION ALL
    SELECT 'foo' AS n, '{4,5,6}'::integer[] AS a, 1 AS q
) results
GROUP BY n

Output:

n a q
'foo' {1,2,3,4,5,6} 2

I have tried asking every LLM on the internet and none can answer it, so it's down to you humans now, please help.


Solution

  • The problem is in the intermediary result of ARRAY_AGG(a) in your nested subquery. That builds a multi-dimensional array, for which arrays with identical cardinalities in the input are required. See:

    The basic difficulty for Postgres is that an array type can contain arbitrary array dimensions, but multidimensional arrays must have matching extents for each dimension.

    There is a lot of fine-print to this. NULL values? Empty arrays? Duplicates? Preserve array positions? ....

    Unnest first

    Simpler cases than yours can be solved by simply unnesting first. But you also want to aggregate a sum. You could run two separate aggregations:

    WITH results(n,a,q) AS (
       VALUES
         ('foo', '{1,2}'::int[], 1)
       , ('foo', '{3,4,5}', 1)
       )
    SELECT n, a, q
    FROM (
       SELECT n, sum(q) AS q
       FROM   results
       GROUP  BY n
       ) q
    FULL JOIN (
       SELECT n, array_agg(elem) AS a
       FROM   results, unnest(a) AS elem
       GROUP  BY n
       ) a USING (n);
    

    Complicates the query, but uses only basic tools.

    Custom aggregate function (my pick!)

    Alternatively, create a custom aggregate function that simply concats arrays - with matching dimensions!

    Create once per database:

    CREATE AGGREGATE array_concat (anycompatiblearray) (
      sfunc = array_cat
    , stype = anycompatiblearray
    , initcond = '{}'
    );
    

    Details in the manual.

    Then:

    SELECT n, array_concat(a) AS a, sum(q) AS q
    FROM  (
        SELECT 'foo' AS n, '{1,2}'::int[] AS a, 1 AS q
        UNION ALL
        SELECT 'foo' AS n, '{3,4,5}'::int[] AS a, 1 AS q
    ) results
    GROUP BY n;
    

    Note: Demands matching array dimensions in the input. I added a violating example in the fiddle.
    (Works fine for all 1-dimensional arrays like in your example.)

    In a quick test on a big table with large arrays, the custom aggregate function was 10 - 100 times faster than any workaround here that has to unnest arrays first.

    Workaround with jsonb

    You posted a related solution. Here is another version with jsonb_path_query():

    SELECT n
         , ARRAY(SELECT jsonb_path_query(jsonb_agg(a), '$[*][*]')::int) AS a
         , sum(q) AS q
    FROM   results
    GROUP  BY n;
    

    But it performed poorly in a test.

    Related:

    fiddle