I'm doing year on year
analysis by doing joins. I'm joining the same table for every year but since I'm using another tool to build my SQL its not so 'dynamic'. If I could solve this with window functions it would be better. So any suggestion is appreciated :D
The idea is to do this by hour. That is, I want to compare sales lets say from 2022-04-05 hour 8 with sales from 2021-04-05 hour 8 and from 2020-04-05 hour 8.
My data is aggregated by hour:
Store | Timestamp | Sales |
---|---|---|
1 | 2019-04-05T08:00:00Z | 10000 |
1 | 2020-04-05T08:00:00Z | 12000 |
1 | 2021-04-05T08:00:00Z | 15000 |
1 | 2022-04-05T08:00:00Z | 20000 |
2 | 2019-04-05T08:00:00Z | 13000 |
2 | 2020-04-05T08:00:00Z | 16000 |
2 | 2021-04-05T08:00:00Z | 19000 |
2 | 2022-04-05T08:00:00Z | 22000 |
Desired result (order may begin with this year) timestamps are not needed. I added them just to clarify:
Store | Timestamp_1 | Sales_1 | Timestamp_2 | Sales_2 | Timestamp_3 | Sales_3 |
---|---|---|---|---|---|---|
1 | 2019-04-05T08:00:00Z | 10000 | 2020-04-05T08:00:00Z | 12000 | 2021-04-05T08:00:00Z | 15000 |
2 | 2019-04-05T08:00:00Z | 13000 | 2020-04-05T08:00:00Z | 16000 | 2021-04-05T08:00:00Z | 19000 |
Any ideas? Thanks in Advance
Not really your answer but if you only have one days worth of hours
SELECT
store
,hour(date)
,array_agg(object_construct(date::text, sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
ORDER BY 1,2;
gives:
STORE | HOUR(DATE) | HOUR_HISTORY |
---|---|---|
1 | 8 | [ { "2019-04-05 08:00:00.000": 10000 }, { "2020-04-05 08:00:00.000": 12000 }, { "2021-04-05 08:00:00.000": 15000 }, { "2022-04-05 08:00:00.000": 20000 } ] |
2 | 8 | [ { "2019-04-05 08:00:00.000": 13000 }, { "2020-04-05 08:00:00.000": 16000 }, { "2021-04-05 08:00:00.000": 19000 }, { "2022-04-05 08:00:00.000": 22000 } ] |
Thus:
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
)
ORDER BY 1;
does give:
STORE | TIMESTAMP_1 | SALES_1 | TIMESTAMP_2 | SALES_2 | TIMESTAMP_3 | SALES_3 |
---|---|---|---|---|---|---|
1 | 2019-04-05 08:00:00.000 | 10,000 | 2020-04-05 08:00:00.000 | 12,000 | 2021-04-05 08:00:00.000 | 15,000 |
2 | 2019-04-05 08:00:00.000 | 13,000 | 2020-04-05 08:00:00.000 | 16,000 | 2021-04-05 08:00:00.000 | 19,000 |
If you have many months, day, hours worth of data this works for the inner loop:
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
AKA:
WITH data_table AS (
SELECT * FROM VALUES
(1,'2019-04-05T08:00:00Z'::timestamp,10000),
(1,'2020-04-05T08:00:00Z'::timestamp,12000),
(1,'2021-04-05T08:00:00Z'::timestamp,15000),
(1,'2022-04-05T08:00:00Z'::timestamp,20000),
(1,'2019-03-05T08:00:00Z'::timestamp,10001),
(1,'2020-03-05T08:00:00Z'::timestamp,12001),
(1,'2021-03-05T08:00:00Z'::timestamp,15001),
(1,'2022-03-05T08:00:00Z'::timestamp,20001),
(1,'2019-04-04T08:00:00Z'::timestamp,10002),
(1,'2020-04-04T08:00:00Z'::timestamp,12002),
(1,'2021-04-04T08:00:00Z'::timestamp,15002),
(1,'2022-04-04T08:00:00Z'::timestamp,20002),
(2,'2019-04-05T08:00:00Z'::timestamp,13000),
(2,'2020-04-05T08:00:00Z'::timestamp,16000),
(2,'2021-04-05T08:00:00Z'::timestamp,19000),
(2,'2022-04-05T08:00:00Z'::timestamp,22000)
t(store, date, sales)
)
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
)
ORDER BY 1;
gives:
STORE | TIMESTAMP_1 | SALES_1 | TIMESTAMP_2 | SALES_2 | TIMESTAMP_3 | SALES_3 |
---|---|---|---|---|---|---|
1 | 2019-04-05 08:00:00.000 | 10,000 | 2020-04-05 08:00:00.000 | 12,000 | 2021-04-05 08:00:00.000 | 15,000 |
1 | 2019-03-05 08:00:00.000 | 10,001 | 2020-03-05 08:00:00.000 | 12,001 | 2021-03-05 08:00:00.000 | 15,001 |
1 | 2019-04-04 08:00:00.000 | 10,002 | 2020-04-04 08:00:00.000 | 12,002 | 2021-04-04 08:00:00.000 | 15,002 |
2 | 2019-04-05 08:00:00.000 | 13,000 | 2020-04-05 08:00:00.000 | 16,000 | 2021-04-05 08:00:00.000 | 19,000 |
What you will note is you example has 4 years of data, and you are throwing away to 2022 data.