Search code examples
sqlarraysperformancepostgresqlmadlib

How to unnest a 2d array into a 1d array quickly in PostgreSQL?


I have a really large array that have I computed with Apache Madlib and I would like to apply an operation to each single array in that 2d array.

I have found code that can help me unnest it from this related answer. However, the code is miserably slow on this really large 2d array (150,000+ 1d float arrays). While unnest() only takes a few seconds to run, even after waiting for several minutes the code has not completed.

Surely, there must be a faster way to unnest the large 2d array into smaller 1d arrays? Bonus point if that solution uses Apache Madlib. I did find one lead buried in the documentation called deconstruct_2d_array, however, when I try to call that function on the matrix, it fails with the following error:

ERROR: Function "deconstruct_2d_array(double precision[])": Invalid type conversion. Internal composite type has more elements than backend composite type.


Solution

  • The function you found in my old answer does not scale well for big arrays. I never thought of arrays your size, which should probably be a set (a table) instead.

    Be that as it may, this PL/pgSQL function can replace the one in the referenced answer. Requires Postgres 9.1 or later.

    CREATE OR REPLACE FUNCTION unnest_2d_1d(ANYARRAY, OUT a ANYARRAY)
      RETURNS SETOF ANYARRAY
      LANGUAGE plpgsql IMMUTABLE STRICT AS
    $func$
    BEGIN
       FOREACH a SLICE 1 IN ARRAY $1 LOOP
          RETURN NEXT;
       END LOOP;
    END
    $func$;
    

    40x faster in my test on a big 2d-array in Postgres 9.6.

    STRICT to avoid an exception for NULL input (as commented by IamIC):

    ERROR: FOREACH expression must not be null