I've noticed from the following two window functions:
WITH sales AS (
select 2020 as year, 100 as revenue UNION
select 2021 as year, 200 as revenue UNION
select 2022 as year, 300 as revenue UNION
select 2023 as year, 100 as revenue
)
SELECT JSON_ARRAYAGG(revenue) OVER (order by year) FROM sales LIMIT 1;
# revenue_list
[100]
And this:
WITH sales AS (
select 2020 as year, 100 as revenue UNION
select 2021 as year, 200 as revenue UNION
select 2022 as year, 300 as revenue UNION
select 2023 as year, 100 as revenue
)
SELECT JSON_ARRAYAGG(revenue) OVER () revenue_list FROM sales LIMIT 1;
# revenue_list
[100, 200, 300, 100]
It seems like the default behavior of the frame is:
()
-- then default frame is UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
(order by year)
or (partition by year)
, etc -- then default frame is UNBOUNDED PRECEDING AND CURRENT ROW
.Is that a correct understanding of the default frame behavior? If so, why was that choice made? (For example, why not just have it always unbounded unless specified by the user?)
The LIMIT 1
clause in the first query is misleading you. Consider the full query without any LIMIT
clause:
SELECT JSON_ARRAYAGG(revenue) OVER (ORDER BY year) FROM sales;
The result set generated from this is:
The reason that [100]
gets returned...is totally arbitrary. You don't have an ORDER BY
clause with your LIMIT
query, so the "first" record we see in the result set just happens to get returned (but it could, in theory, be any of the four records).
The OVER ()
version would just render the entire table aggregated into a JSON array, but would display that value for each record.
SELECT JSON_ARRAYAGG(revenue) OVER () FROM sales; -- [100, 200, 300, 100] each record
SELECT JSON_ARRAYAGG(revenue) FROM sales; -- [100, 200, 300, 100] one record