I have a table which contains measures taken on random dates, partitioned by the site at which they were taken.
site | date | measurement |
---|---|---|
AB1234 | 2022-12-09 | 1 |
AB1234 | 2022-06-11 | 2 |
AB1234 | 2019-05-22 | 3 |
AB1234 | 2017-01-30 | 4 |
CD5678 | 2022-11-01 | 5 |
CD5678 | 2020-04-10 | 6 |
CD5678 | 2017-04-10 | 7 |
CD5678 | 2017-01-22 | 8 |
In order to calculate a year on year growth, I want to have an additional field for each record which contains the previous measurement at that site. The challenging part is that I only want the previous which occurred more than a year in the past.
Like so:
site | date | measurement | previous_measurement |
---|---|---|---|
AB1234 | 2022-12-09 | 1 | 3 |
AB1234 | 2022-06-11 | 2 | 3 |
AB1234 | 2019-05-22 | 3 | 4 |
AB1234 | 2017-01-30 | 4 | NULL |
CD5678 | 2022-11-01 | 5 | 6 |
CD5678 | 2020-04-10 | 6 | 7 |
CD5678 | 2017-04-10 | 7 | NULL |
CD5678 | 2017-01-22 | 8 | NULL |
It feels like it should be possible with a window function, but I can't work it out.
Please help :(
Amazon Athena engine version 3 incorporated from Trino. If it has incorporated full support for frame type RANGE for window functions you can use that:
-- sample data
with dataset(site, date, measurement) as (
values ('AB1234', date '2022-12-09', 1),
('AB1234', date '2022-06-11', 2),
('AB1234', date '2019-05-22', 3),
('AB1234', date '2017-01-30', 4),
('CD5678', date '2022-11-01', 5),
('CD5678', date '2020-04-10', 6),
('CD5678', date '2017-04-10', 7),
('CD5678', date '2017-01-22', 8)
)
-- query
select *,
last_value(measurement) over (
partition by site
order by date
RANGE BETWEEN UNBOUNDED PRECEDING AND interval '1' year PRECEDING)
from dataset;
Output:
site | date | measurement | _col3 |
---|---|---|---|
CD5678 | 2017-01-22 | 8 | NULL |
CD5678 | 2017-04-10 | 7 | NULL |
CD5678 | 2020-04-10 | 6 | 7 |
CD5678 | 2022-11-01 | 5 | 6 |
AB1234 | 2017-01-30 | 4 | NULL |
AB1234 | 2019-05-22 | 3 | 4 |
AB1234 | 2022-06-11 | 2 | 3 |
AB1234 | 2022-12-09 | 1 | 3 |