Search code examples
google-bigqueryaggregate-functionsudf

BigQuery User Defined Aggregation Function?


I know I can define a User Defined Function in order to perform some custom calculation. I also know I can use the 'out-of-the-box' aggregation functions to reduce a collection of values to a single value when using a GROUP BY clause.

Is it possible to define a custom user-defined, Aggregation Function to use with a GROUP BY clause?


Solution

  • Turns out that this IS possible (as long as the groups we seek to aggregate are of a reasonable size in memory) with a little bit of 'glue' - namely the ARRAY_AGG function

    The steps are as follows:

    1. Create a UDF with an input parameter of type ARRAY<T> where T is the type of value you want to aggregate.
    2. Use the ARRAY_AGG function in the query with the GROUP BY clause to generate an array of T and pass into your UDF.

    As a concrete example:

    CREATE TEMP FUNCTION aggregate_fruits(fruits ARRAY<STRING>)
    RETURNS STRING
    LANGUAGE js AS """
    return "my fruit bag contains these items: " + fruits.join(",");
    """;
    
    WITH fruits AS
    (SELECT "apple" AS fruit
    UNION ALL SELECT "pear" AS fruit
    UNION ALL SELECT "banana" AS fruit)
    
    SELECT aggregate_fruits(ARRAY_AGG(fruit))
    FROM fruits