I have a query where I am trying to find the average of energy consumption over a group of service locations that each fall within ±5% of each other for a particular month. I know that range and rows function can't define a particular range of numbers and instead use preceding and following clauses but how can I write this query to suit my needs?
WITH energy_consumption_cte AS (
SELECT
sl.id,
sl.sqft,
SUM(CASE WHEN dl.log_label = 'energy use' THEN dl.value END) AS monthly_consumption
FROM
serv_location sl
INNER JOIN
device d ON d.service_loc_id = sl.id
INNER JOIN
device_log dl ON dl.device_id = d.id
WHERE
dl."timestamp" >= '2022-08-01 00:01:00.000'::timestamp
AND dl."timestamp" < '2022-09-01 00:01:00.000'::timestamp
AND sl.sqft IS NOT NULL
GROUP BY
sl.id, sl.sqft
)
SELECT
id,
sqft,
monthly_consumption,
(monthly_consumption / AVG(monthly_consumption) OVER w) * 100
AS average_monthly_for_sqft_range
FROM
energy_consumption_cte
window w as (PARTITION BY sqft ORDER BY sqft range between
ROUND(sqft * 0.95) and ROUND(sqft * 1.05))
ORDER BY
sqft ASC;
Window frame range
mode can define particular range. To use an aggregate function that'll include values from records 5% up/down from the current one, you should think range between 0.05*sqft preceding and 0.05*sqft following
. Problem is, you can't use variables there.
In the offset
PRECEDING
and offsetFOLLOWING
frame options, the offset must be an expression not containing any variables, aggregate functions, or window functions.
If you try to refer to the sqft
column/field, you'll get this:
ERROR: argument of RANGE must not contain variables LINE 27: RANGE BETWEEN sqft*0.05 PRECEDING ^
You can work around that limitation by first calculating % change from row to row using lag()
or lead()
, then collecting that into a stepping sum()
, so that it maintains the initial order of sqft
. Once you have that, you can use constant offset in the frame clause to reach 5% back and forth. Demo at db<>fiddle:
WITH energy_consumption_cte AS (
SELECT sl.id,
sl.sqft,
SUM(dl.value)filter(where dl.log_label='energy use') AS monthly_consumption
FROM serv_location sl
INNER JOIN device d ON d.service_loc_id = sl.id
INNER JOIN device_log dl ON dl.device_id = d.id
WHERE dl."timestamp" >= '2022-08-01 00:01:00.000'::timestamp
AND dl."timestamp" < '2022-09-01 00:01:00.000'::timestamp
AND sl.sqft IS NOT NULL
GROUP BY sl.id, sl.sqft
),square_footage_steps_cte AS (
SELECT *, 1-lag(sqft,1,sqft)over(order by sqft)/sqft::numeric AS sqft_step
FROM energy_consumption_cte
ORDER BY sqft ASC
),square_footage_steps_cumulative_cte AS (
SELECT *, sum(sqft_step)over(order by sqft) AS sqft_step_sum
FROM square_footage_steps_cte
ORDER BY sqft ASC)
SELECT id,
sqft,
monthly_consumption,
AVG(monthly_consumption) OVER w AS average_monthly_for_sqft_range,
array_agg(id)over w as rows_in_frame
FROM square_footage_steps_cumulative_cte
WINDOW w as (ORDER BY sqft_step_sum
RANGE BETWEEN 0.05 PRECEDING
AND 0.05 FOLLOWING)
ORDER BY sqft ASC;
id | sqft | monthly_consumption | average_monthly_for_sqft_range | rows_in_frame |
---|---|---|---|---|
1 | 100 | 400 | 700.0000000000000000 | {1,2} |
2 | 105 | 1000 | 1000.0000000000000000 | {1,2,3} |
3 | 110 | 1600 | 1600.0000000000000000 | {2,3,4} |
4 | 115 | 2200 | 1900.0000000000000000 | {3,4} |
5 | 200 | 2800 | 3100.0000000000000000 | {5,6} |
6 | 210 | 3400 | 3100.0000000000000000 | {5,6} |