I have a table like this
+------+------+-------+--+
| Name | Date | Price | |
+------+------+-------+--+
| x | d1 | 50 | |
| x | d2 | 45 | |
| x | d3 | 55 | |
| x | d4 | 40 | |
| x | d5 | 48 | |
| x | d6 | 45 | |
| x | d7 | 44 | |
| y | d1 | 200 | |
| y | d2 | 205 | |
| y | d3 | 204 | |
| y | d4 | 189 | |
| y | d5 | 185 | |
| y | d6 | 187 | |
| y | d7 | 182 | |
+------+------+-------+--+
I want to find high price for each name and date in next 3 days.
OutPut:
+------+------+-------+---------+-----------+--+
| Name | Date | Price | High_pr | High_date | |
+------+------+-------+---------+-----------+--+
| x | d1 | 50 | 55 | d3 | |
| x | d2 | 45 | 55 | d3 | |
| x | d3 | 55 | 55 | d3 | |
| x | d4 | 40 | 48 | d5 | |
| x | d5 | 48 | 48 | d5 | |
| x | d6 | 45 | 45 | d6 | |
| x | d7 | 44 | 44 | d7 | |
| y | d1 | 200 | 205 | d2 | |
| y | d2 | 205 | 205 | d2 | |
| y | d3 | 204 | 204 | d3 | |
| y | d4 | 189 | 189 | d4 | |
| y | d5 | 185 | 187 | d6 | |
| y | d6 | 187 | 187 | d6 | |
| y | d7 | 182 | 182 | d7 | |
+------+------+-------+---------+-----------+--+
Tried using window function with following '3 day'
but not working.
The table is really big with many names spanning 5 years of data, need some optimum solution. Thank You
PS: Using PostgreSQL 12.4.
I am wondering if this can be done reasonably without the use of a lateral join. Window functions don't seem to be optimal, but the following does work using a correlated subquery:
select t.name, t.date, t.price, max_price_3,
(select pd.date
from unnest(t.pd) pd (date date, price int)
where pd.price = t.max_price_3
limit 1
) as date_3
from (select t.*,
max( price ) over (partition by name order by date range between current row and interval '3 day' following) as max_price_3,
array_agg( (date, price) ) over (partition by name order by date range between current row and interval '3 day' following) as pd
from t
) t ;
Here is a db<>fiddle.
There is a way to do this using only window functions . . . but using generate_series()
. This is a little complicated to follow, because the original date/price is spread backwards through the days in the period -- then the maximum row is taken:
select name, dte, price, price_3, date as date_3
from (select t.*, dte, min(date) over (partition by name) as min_date,
max(t.price) over (partition by t.name, gs.dte) as price_3,
row_number() over (partition by t.name, gs.dte order by price desc) as seqnum
from t cross join
generate_series(date, date - interval '3 day', interval '-1 day') gs(dte)
) t
where seqnum = 1 and dte >= min_date
order by t.name, t.date;