I’m working with a Snowflake table where each row contains structured data in two columns. Here’s an example of the table with 3 rows:
| id | data_1 | data_2 |
|----------------------------------------------|
| 1 | 2024-01-01, 1 | 2023-02-01, 2 |
| | 2024-01-02, 2 | 2023-02-02, 2 |
|----------------------------------------------|
| 2 | 2024-02-03, 3 | 2023-02-01, 20 |
| | 2024-02-02, 4 | 2023-02-02, 20 |
|----------------------------------------------|
| 3 | 2024-03-01, 5 | 2023-02-01, 30 |
| | 2024-03-02, 6 | 2023-02-02, 30 |
|----------------------------------------------|
Conceptually, in each row, columns data_1
and data_2
contain an array of structured data where each element has a date and an integer value associated with it.
I want to query this table and obtain the following results:
data_1
: the value associated with the latest date.data_2
: the sum of all values.Note: I need the date in data_2
for other operations.
Desired result of the query:
| id | data_1 | data_2 |
|----------------------|
| 1 | 2 | 4 |
|----------------------|
| 2 | 3 | 40 |
|----------------------|
| 3 | 6 | 60 |
|----------------------|
Questions:
data_1
and data_2
? Ideally, I prefer a data type where I can address each sub-element using a name (something akin to data_1.date
and data_1.value
, so perhaps a structured object
or a structured map
).We can do a lot with filter
and transform
, but we still need a reduce
- so we will implement that with a JS UDF in the meantime.
with data as (
select 1 id, [{'date':'2024-01-01', 'v':1}, {'date':'2024-01-02', 'v':2}] data_1, [{'date':'2023-02-01', 'v':2}, {'date':'2023-02-02', 'v':2}] data_2
union all select 2 id, [{'date':'2024-02-03', 'v':3}, {'date':'2024-02-02', 'v':4}], [{'date':'2023-02-01', 'v':20}, {'date':'2023-02-02', 'v':20}]
union all select 3 id, [{'date':'2024-03-01', 'v':5}, {'date':'2024-03-02', 'v':6}], [{'date':'2023-02-01', 'v':30}, {'date':'2023-02-02', 'v':30}]
)
select arrays_to_object(transform(data_1, x -> x['date']), transform(data_1, x -> x['v']))[array_max(transform(data_1, y -> y['date']))]
, array_sum_js(transform(data_2, x -> x['v']))
from data
;
The JS UDF:
create function array_sum_js(X array)
returns float
language javascript as
$$
return X.reduce((a, b) => a + b, 0);
$$;