Search code examples
sqlgoogle-bigquerynesteduser-defined-functionsunnest

How to create a BigQuery UDF function to UNNEST multiple REPEATED levels?


Background: I have a BigQuery schema with several REPEATED levels and trying create UDF functions to facilitate querying.

Able to handle 1 level: when I'm dealing with only 1 REPEATED level I'm able to write the below UDF function which works:

enter image description here

CREATE TEMP FUNCTION struct_array_struct_scalar(data ANY TYPE) AS (
  (
    SELECT ARRAY_AGG(l1.element.id)
    FROM UNNEST(data.list) as l1
  )
);
SELECT struct_array_struct_scalar(my_field) as my_field_ids FROM my_table

The above works fine and returns an ARRAY of ids.

Struggling with multiple levels: now when I have multiple nested REPEATED fields this is where the challenge arises

enter image description here

Here the values I'm interested in are nested under my_field.list[].element.rates.list[].element.rate.double. There are only 2 REPEATED levels (list[]), everything else is nested under records so using the . notation to get them is trivial.

So far I have:

CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
  (
    SELECT ARRAY_AGG(l2.element.rate.double)
    FROM UNNEST(
      (SELECT l1.element.rates.list
      FROM UNNEST(data.list) AS l1)
    ) AS l2
  )
);

So I have:

  • 2 nested subqueries with UNNEST to handle the 2 REPEATED fields with respective aliases l1 and l2
  • . notation to go access nested records within those lists

Although the syntax seems good I get the following error:

Scalar subquery produced more than one element

Q: Can you please give me a clue as to what I need to change to return an ARRAY of rate.double?


Solution

  • Hope below give you some direction for your problem. (It might need to be modified depending on your real data)

    CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
      ARRAY(
        SELECT l2.element.rate.double 
          FROM UNNEST(data.list) l1, UNNEST(l1.element.rates.list) l2
         WHERE l2.element.rate.double IS NOT NULL
      )
    );
    
    -- another udf which returns same result as above.
    CREATE TEMP FUNCTION foo0(data ANY TYPE) AS ((
      SELECT ARRAY_AGG(l2.element.rate.double IGNORE NULLS)
        FROM UNNEST(data.list) l1, UNNEST(l1.element.rates.list) l2
    ));
    
    SELECT foo(my_field) FROM my_table;
    

    Query results

    enter image description here

    Sample Data

    enter image description here