Search code examples
google-bigquerybigquery-udf

pass an ARRAY<STRUCT> to a js UDF in bigquery sql


I'm trying to get data from some tables, and pass them to a JavaScript UDF in bigquery.

I wrote the following code but I can't seem to understand the correct syntax to to store the result of my select in the wanted structure, then how to pass it to my udf function.

DECLARE arg1 ARRAY<STRING>;
DECLARE arg2 ARRAY<STRUCT <col1 STRING, col2 STRING> >;
DECLARE res1 ARRAY<STRING>;

SET arg1 = ARRAY<STRING>["Adams", "Joseph", "Davis", "Mary", "Jesus"] ;


CREATE TEMP FUNCTION  myfunction(arg1 ARRAY<STRING> , arg2 ARRAY<STRUCT<col1 STRING, col2 STRING> > ) 
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
return ["a", "b", "c"]
''';

SET ARG2 = (SELECT AS STRUCT(
WITH TBL1 AS
 (SELECT 'Adams' as LastName, 50 as SchoolID UNION ALL
  SELECT 'Buchanan', 52 UNION ALL
  SELECT 'Coolidge', 52 UNION ALL
  SELECT 'Davis', 51 UNION ALL
  SELECT 'Eisenhower', 77)
SELECT LastName as col1, SchoolID as col2 FROM TBL1));

SET res1 = select res FROM UNNEST(myfunction( arg1, arg2 )) res;

Solution

  • There are a few syntax errors with the code you shared but based on the procedural logic, I've made a few adjustments to a more declarative style and corrected these.

    1. I have kept arg1 even though we may also include this in the TBL1 CTE if desired
    2. arg2 is now created by selected the values from TBL1 as a STRUCT and the columns are renamed using as col1 and as col2
    3. I've casted SchoolId as a string since your UDF accepts col2 as string
    4. I've added another field id for demo purposes as I will group or aggregate the data by this id column to get the array of struct arg2 parameter required by your UDF
    5. I finally select id and call your UDF with the parameters in the correct format. array_agg is used to create the array of structs here. The result is in a column named res2
    DECLARE arg1 ARRAY<STRING>;
    
    SET arg1 = ARRAY<STRING>["Adams", "Joseph", "Davis", "Mary", "Jesus"] ;
    
    
    CREATE TEMP FUNCTION  myfunction(arg1 ARRAY<STRING> , arg2 ARRAY<STRUCT<col1 STRING, col2 STRING> > ) 
    RETURNS ARRAY<STRING>
    LANGUAGE js AS '''
    return ["a", "b", "c"]
    ''';
    
    WITH TBL1 AS (
      SELECT 'Adams' as LastName, 50 as SchoolID UNION ALL
      SELECT 'Buchanan', 52 UNION ALL
      SELECT 'Coolidge', 52 UNION ALL
      SELECT 'Davis', 51 UNION ALL
      SELECT 'Eisenhower', 77
    ),
    tbl2 as (
        SELECT 
            1 as id, 
            STRUCT( 
                LastName as col1, 
                CAST(SchoolID as STRING) as col2
            ) as arg2 
        FROM TBL1
    )
    select id, myfunction(arg1, array_agg( arg2)) as res1 from tbl2
    group by id
    
    
    

    Let me know if this works for you.