There is a portion of the SQL query that fetches a value from the previous row.
...
lag(rs.processed_value, 1) OVER (
PARTITION BY rs.rloi_id, rs.qualifier
ORDER BY rs.rloi_id, rs.qualifier, rs.timestamp
) AS previous_value,
rs.parent_rank FROM rs WHERE rs.parent_rank = ANY (
ARRAY[1::bigint, 2::bigint]
)
...
roi_id | qualifier | processed_value | previous_value | parent_rank | timestamp |
---|---|---|---|---|---|
1001 | STAGE | 0.76 | NULL | 2 | 2023-01-04 |
1001 | STAGE | 0.78 | 0.76 | 1 | 2023-01-05 |
1001 | STAGE | 0.66 | NULL | 2 | 2021-01-04 |
1001 | STAGE | 0.68 | 0.66 | 1 | 2021-01-05 |
What I would like to do now is fetch the previous 4 rows (parent_rank
's 2-5), SUM() and divide them by 4
to get an average, and return that as the previous_value
.
I am able to return the previous 4 values by increasing the ARRAY
section of the query.
...
rs.parent_rank FROM rs WHERE rs.parent_rank = ANY (
ARRAY[1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint]
)
...
roi_id | processed_value | previous_value | parent_rank | timestamp |
---|---|---|---|---|
1001 | 0.70 | NULL | 5 | 2023-01-01 |
1001 | 0.72 | 0.70 | 4 | 2023-01-02 |
1001 | 0.74 | 0.72 | 3 | 2023-01-03 |
1001 | 0.76 | 0.74 | 2 | 2023-01-04 |
1001 | 0.78 | 0.76 | 1 | 2023-01-05 |
1001 | 0.60 | NULL | 5 | 2021-01-01 |
1001 | 0.62 | 0.60 | 4 | 2021-01-02 |
1001 | 0.64 | 0.62 | 3 | 2021-01-03 |
1001 | 0.66 | 0.64 | 2 | 2021-01-04 |
1001 | 0.68 | 0.66 | 1 | 2021-01-05 |
The desired outcome is to have the previous_value
for rows of parent_rank = 1
, to be the new averaged value.
Desired outcome
roi_id | processed_value | previous_value | parent_rank | timestamp |
---|---|---|---|---|
1001 | 0.76 | NULL | 2 | 2023-01-04 |
1001 | 0.78 | 0.73* | 1 | 2023-01-05 |
1001 | 0.66 | NULL | 2 | 2021-01-04 |
1001 | 0.68 | 0.63* | 1 | 2021-01-05 |
(*the averaged value)
Portion of the query
WITH ranked_all_value_summaries AS (
SELECT tvp.rloi_id,
tv.processed_value,
tv.timestamp,
tv.error,
rank() OVER (PARTITION BY tvp.rloi_id, tvp.qualifier ORDER BY tv.timestamp DESC, tv.telemetry_value_id DESC) AS parent_rank
FROM sls_telemetry_value tv
JOIN sls_telemetry_value_parent tvp ON tv.telemetry_value_parent_id = tvp.telemetry_value_parent_id
WHERE lower(tvp.parameter) = 'water level'::text
AND lower(tvp.units) !~~ '%deg%'::text
AND lower(tvp.qualifier) !~~ '%height%'::text
AND lower(tvp.qualifier) <> 'crest tapping'::text
),
latest_value_summaries_with_previous_value AS (
SELECT ranked_all_value_summaries.rloi_id,
ranked_all_value_summaries.qualifier,
ranked_all_value_summaries.processed_value,
ranked_all_value_summaries.timestamp,
lag(ranked_all_value_summaries.processed_value, 1) OVER (
PARTITION BY ranked_all_value_summaries.rloi_id, ranked_all_value_summaries.qualifier
ORDER BY ranked_all_value_summaries.rloi_id, ranked_all_value_summaries.qualifier, ranked_all_value_summaries.timestamp
) AS previous_value
ranked_all_value_summaries.parent_rank FROM ranked_all_value_summaries WHERE ranked_all_value_summaries.parent_rank = ANY (
ARRAY[1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint]
)
) SELECT * FROM latest_value_summaries_with_previous_value;
Basically, you can adjust the window frame and just use avg()
:
avg(rs.processed_value OVER (PARTITION BY rs.rloi_id, rs.qualifier
ORDER BY rs.timestamp
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) -- !
You don't need to repeat columns listed in the PARTITION
clause in the ORDER BY
clause.
It's unclear what the result should be when there are fewer than 4 previous rows available.
Related: