Search code examples
sqlhivemoving-averageweighted-average

weighted moving average with Hive SQL


Given a Hive table like below I want to calculate a weighted moving average. In table 1 is an example of such a weighted moving average. The value of every column is

col_value = (1 * n) + (0.75 * n-1) + (0.5 * n-2) + (0.25 * n-3) 
where n is the value at the current row, n-1 the value at the above row, etc.

More info on weighted moving averages on the corresponding Wikipedia section

Im stuck after calculating the moving average like so:

Code sample 1: Query so far

SELECT
    *,
    AVG (value) OVER (
        ORDER BY
            id
        ROWS BETWEEN
            3 PRECEDING AND CURRENT ROW
FROM
    table

Table 1:

id    value    weighted_moving_average   code_sample_1
...
11     0        0                        0
12     1        1                        0.25
13     0        0.75                     0.25
14     0        0.5                      0.25
15     0        0.25                     0.25
16     0        0                        0
...

Solution

  • If you want the window size to be easily changed, then the implementation would be substantially different.

    (Please note that I developed this in because I am not aware of any website I can test SQL. You may need to tweak some syntax.)

    Declare @thisManyRows int = 3;
    With rowNumber_cte As (
        Select 
            id, 
            [Value],
            RowNo = Row_Number() Over (Order By id)
          From Table1),
    windows_cte As (
        Select
            b_id = b.id,
            j_id = j.id,
            j.[Value],
            RowNo2 = @thisManyRows + 1 - Row_Number() Over (Partition By b.id Order By j.id desc),
            n = Count(j.[Value]) Over (Partition By b.id)
          From rowNumber_cte As b
          Join rowNumber_cte As j On j.RowNo Between b.RowNo - @thisManyRows + 1 And b.RowNo)
    Select
        id = b_id,
        [Value] = Sum(Case When b_id = j_id Then [Value] Else Null End),
        WeightedAverage = Sum([Value] * 1.0 * (RowNo2 *1.0 / @thisManyRows)) / Min(n)
      From windows_cte
      Group By b_id;
    

    I'm going to explain what this does because it's a bit cryptic:

    1. Set the variable to the number of rows to consider for the moving average (including the current row).
    2. In the rowNumber_cte we assign a row number, we do this because we don't assume that id will be sequential.
    3. In the windows_cte we join the rowNumber_cte on itself, so that each row is joined to all the rows that make up it's window of calculation (with reference to the @thisManyRows variable. This cte also introduces a new RowNo2 field which will basically end up being the numerator of the fraction for which this row's value contributes to the weighted moving average (i.e. the current row, if the window is 4 will have RowNo2 = 4 and will have 4/4 weight).
    4. windows_cte also calculates the number of rows in the window to better handle rows at the start of the data which don't have a full window.
    5. The final select groups by id, multiplies the [Value] of each row in the window by it's weight and then divides by the number of rows in the window: n.