Search code examples
google-bigquerybigquery-udf

How to move this group/concat logic into a function?


Given a column of integers

ids AS (
  SELECT
    id
  FROM
    UNNEST([1, 2, 3, 4, 5, 6, 7]) AS id)

I'd like to convert them into the following (batched) string representations:

"1,2,3,4,5"
"6,7"

Currently, I do this as follows:

SELECT
  STRING_AGG(CAST(id AS STRING), ',')
FROM (
  SELECT
    DIV(ROW_NUMBER() OVER() - 1, 5) batch,
    id
  FROM
    ids)
GROUP BY
  batch

Since I use this on multiple occasions, I'd like to move this into a function.

Is this possible, and if so how?

(I guess, since we can't pass the table (ids), we'd need to pass an ARRAY<INT64>, but that would be ok.)


Solution

  • I think you might consider below 2 approches.

    UDF

    • returns result as ARRAY<STRING>.
    CREATE TEMP FUNCTION batched_string(ids ARRAY<INT64>) AS (
      ARRAY(
        SELECT STRING_AGG('' || id) FROM (
          SELECT DIV(offset, 5) batch, id 
            FROM UNNEST(ids) id WITH offset
        ) GROUP BY batch
      )
    );
    
    SELECT * FROM UNNEST(batched_string([1, 2, 3, 4, 5, 6, 7]));
    

    Table functions

    • return result as a Table.
    • note that a table function shouldn't be a temp function.
    CREATE OR REPLACE TABLE FUNCTION `your-project.dataset.batched_string`(ids ARRAY<INT64>) AS (
      SELECT STRING_AGG('' || id) batched FROM (
        SELECT DIV(offset, 5) batch, id 
          FROM UNNEST(ids) id WITH offset
      ) GROUP BY batch
    );
    
    SELECT * FROM `your-project.dataset.batched_string`([1, 2, 3, 4, 5, 6, 7]);
    

    enter image description here