Search code examples
sqlpostgresqlwindow-functions

Selecting and averaging previous values of a row


There is a portion of the SQL query that fetches a value from the previous row.

...
lag(rs.processed_value, 1) OVER (
  PARTITION BY rs.rloi_id, rs.qualifier
  ORDER BY rs.rloi_id, rs.qualifier, rs.timestamp
) AS previous_value,
rs.parent_rank FROM rs WHERE rs.parent_rank = ANY (
  ARRAY[1::bigint, 2::bigint]
)
...
roi_id qualifier processed_value previous_value parent_rank timestamp
1001 STAGE 0.76 NULL 2 2023-01-04
1001 STAGE 0.78 0.76 1 2023-01-05
1001 STAGE 0.66 NULL 2 2021-01-04
1001 STAGE 0.68 0.66 1 2021-01-05

What I would like to do now is fetch the previous 4 rows (parent_rank's 2-5), SUM() and divide them by 4 to get an average, and return that as the previous_value.

I am able to return the previous 4 values by increasing the ARRAY section of the query.

...
rs.parent_rank FROM rs WHERE rs.parent_rank = ANY (
  ARRAY[1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint]
)
...
roi_id processed_value previous_value parent_rank timestamp
1001 0.70 NULL 5 2023-01-01
1001 0.72 0.70 4 2023-01-02
1001 0.74 0.72 3 2023-01-03
1001 0.76 0.74 2 2023-01-04
1001 0.78 0.76 1 2023-01-05
1001 0.60 NULL 5 2021-01-01
1001 0.62 0.60 4 2021-01-02
1001 0.64 0.62 3 2021-01-03
1001 0.66 0.64 2 2021-01-04
1001 0.68 0.66 1 2021-01-05

The desired outcome is to have the previous_value for rows of parent_rank = 1, to be the new averaged value.

Desired outcome

roi_id processed_value previous_value parent_rank timestamp
1001 0.76 NULL 2 2023-01-04
1001 0.78 0.73* 1 2023-01-05
1001 0.66 NULL 2 2021-01-04
1001 0.68 0.63* 1 2021-01-05

(*the averaged value)

Portion of the query

WITH ranked_all_value_summaries AS (
    SELECT tvp.rloi_id,
        tv.processed_value,
        tv.timestamp,
        tv.error,
        rank() OVER (PARTITION BY tvp.rloi_id, tvp.qualifier ORDER BY tv.timestamp DESC, tv.telemetry_value_id DESC) AS parent_rank
        FROM sls_telemetry_value tv
        JOIN sls_telemetry_value_parent tvp ON tv.telemetry_value_parent_id = tvp.telemetry_value_parent_id
            WHERE lower(tvp.parameter) = 'water level'::text
            AND lower(tvp.units) !~~ '%deg%'::text
            AND lower(tvp.qualifier) !~~ '%height%'::text
            AND lower(tvp.qualifier) <> 'crest tapping'::text
    ),
    latest_value_summaries_with_previous_value AS (
        SELECT ranked_all_value_summaries.rloi_id,
            ranked_all_value_summaries.qualifier,
            ranked_all_value_summaries.processed_value,
            ranked_all_value_summaries.timestamp,
            lag(ranked_all_value_summaries.processed_value, 1) OVER (
                PARTITION BY ranked_all_value_summaries.rloi_id, ranked_all_value_summaries.qualifier
                    ORDER BY ranked_all_value_summaries.rloi_id, ranked_all_value_summaries.qualifier, ranked_all_value_summaries.timestamp
            ) AS previous_value
            ranked_all_value_summaries.parent_rank FROM ranked_all_value_summaries WHERE ranked_all_value_summaries.parent_rank = ANY (
                ARRAY[1::bigint, 2::bigint, 3::bigint, 4::bigint, 5::bigint]
            )
  ) SELECT * FROM latest_value_summaries_with_previous_value;

Solution

  • Basically, you can adjust the window frame and just use avg():

        avg(rs.processed_value OVER (PARTITION BY rs.rloi_id, rs.qualifier
                                     ORDER BY rs.timestamp
                                     ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING)  -- !
    

    You don't need to repeat columns listed in the PARTITION clause in the ORDER BY clause.

    It's unclear what the result should be when there are fewer than 4 previous rows available.

    Related: