I have the following records:
id run_hour performance_hour value
2 "2017-06-25 09:00:00" "2017-06-25 07:00:00" 6
2 "2017-06-25 09:00:00" "2017-06-25 08:00:00" 5
1 "2017-06-25 09:00:00" "2017-06-25 08:00:00" 5
2 "2017-06-25 08:00:00" "2017-06-25 07:00:00" 5
1 "2017-06-25 08:00:00" "2017-06-25 07:00:00" 5
We run every hour to look at the result of each id for current hour and previous hours.
Only if there was a change from previous hour run we insert a new reocrd (we don't want to overwrite the value because we want to measure the value if looked after 1 hour, or 2 hours etc.
I would like to sum for each id in the latest available value (sort by run_hour) - the values.
In the above example, ad 1 for run 9:00 and performance hour of 7:00 doesn't have a record - because it was the same as the run of 8:00 and performance hour of 7:00
In the above example, if I ask for sum of values of run 2017-06-25 09:00:00 I would expect to get:
id, value
1 10
2 11
for id 1, it's 10 calculated: (run_hour<2017-06-25 08:00:00> + run_hour<2017-06-25 09:00:00>) and for id 2, it's 11 calculated: (run_hour<2017-06-25 09:00:00> + run_hour<2017-06-25 09:00:00>) I wrote the following query:
select distinct on (id, run_hour) id, sum(value) from metrics where run_hour <= '2017-06-25 09:00' and performance_hour >= '2017-06-25 07:00' and performance_hour < '2017-06-25 09:00'
group by id
order by id, run_hour
However I get an error that also run_hour has to be in the GROUP BY clause. - but if I add it I get incorrect data - also data for previous hours which I don't need - I need the latest hour that had data.
How can I use "distinct on" with group by?
The task is very complicated. Let's say you want the performance hours 7:00 till 9:00 from the following data:
id run_hour performance_hour value 2 "2017-06-25 09:00:00" "2017-06-25 06:00:00" 6 2 "2017-06-25 09:00:00" "2017-06-25 10:00:00" 5
The expected result would be 18 (6 for 7:00 + 6 for 8:00 + 6 for 9:00) all based on the 6:00 record which itself is outside the desired time range.
We need a recursive CTE starting from the first wanted performance hour per id till the last wanted one. Thus we build records that don't exist and that we can sum up later.
with recursive cte(id, run_hour, performance_hour, value) as
(
select *
from
(
select distinct on (id)
id,
run_hour,
greatest(performance_hour, timestamp '2017-06-25 07:00') as performance_hour,
value
from metrics
where run_hour = timestamp '2017-06-25 09:00'
and performance_hour <= timestamp '2017-06-25 07:00'
order by id, metrics.performance_hour desc
) start_by_id
union all
select
cte.id,
cte.run_hour,
cte.performance_hour + interval '1 hour' as performance_hour,
coalesce(m.value, cte.value) as value
from cte
left join metrics m on m.id = cte.id
and m.run_hour = cte.run_hour
and m.performance_hour = cte.performance_hour + interval '1 hour'
where cte.performance_hour < timestamp '2017-06-25 09:00'
)
select id, sum(value)
from cte
group by id;
Rextester link: http://rextester.com/PHC88770