Background: I have a BigQuery schema
with several REPEATED
levels and trying create UDF functions
to facilitate querying.
Able to handle 1 level: when I'm dealing with only 1 REPEATED level I'm able to write the below UDF function which works:
CREATE TEMP FUNCTION struct_array_struct_scalar(data ANY TYPE) AS (
(
SELECT ARRAY_AGG(l1.element.id)
FROM UNNEST(data.list) as l1
)
);
SELECT struct_array_struct_scalar(my_field) as my_field_ids FROM my_table
The above works fine and returns an ARRAY
of ids.
Struggling with multiple levels: now when I have multiple nested REPEATED
fields this is where the challenge arises
Here the values I'm interested in are nested under my_field.list[].element.rates.list[].element.rate.double
. There are only 2 REPEATED
levels (list[]), everything else is nested under records so using the .
notation to get them is trivial.
So far I have:
CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
(
SELECT ARRAY_AGG(l2.element.rate.double)
FROM UNNEST(
(SELECT l1.element.rates.list
FROM UNNEST(data.list) AS l1)
) AS l2
)
);
So I have:
UNNEST
to handle the 2 REPEATED
fields with respective aliases l1
and l2
.
notation to go access nested records within those listsAlthough the syntax seems good I get the following error:
Scalar subquery produced more than one element
Q: Can you please give me a clue as to what I need to change to return an ARRAY of rate.double?
Hope below give you some direction for your problem. (It might need to be modified depending on your real data)
CREATE TEMP FUNCTION foo(data ANY TYPE) AS (
ARRAY(
SELECT l2.element.rate.double
FROM UNNEST(data.list) l1, UNNEST(l1.element.rates.list) l2
WHERE l2.element.rate.double IS NOT NULL
)
);
-- another udf which returns same result as above.
CREATE TEMP FUNCTION foo0(data ANY TYPE) AS ((
SELECT ARRAY_AGG(l2.element.rate.double IGNORE NULLS)
FROM UNNEST(data.list) l1, UNNEST(l1.element.rates.list) l2
));
SELECT foo(my_field) FROM my_table;
Query results