Search code examples
sqlpostgresqlwindow-functions

How do you create a moving window calculation in SQL for a time series? (not averages)


I am working with data where I need to use a calculation that is supposed to reference a previously calculated value in the previous row.

For example, take this dataset:

SELECT 
    generate_series('2015-01-01', '2019-12-01', '1 month'::interval)::date AS dates,
    generate_series(1,60) AS nums;

There's are NULL values starting at 2019-03-01.

I'd like to write a calculation on another column that fills it in based off the previous row, that is derived from that same calculation. So I tried to use some lag() functions. But after a while it turns to NULL, probably because the previous calculation is also null.

with
mynumbers AS (
    SELECT
        generate_series('2015-01-01', '2025-12-01', '1 month'::interval)::date AS dates,
        generate_series(1,50) AS nums),
mynumbers_lag AS (
    SELECT *, lag(nums) OVER (ORDER BY dates ASC) AS previous1
    FROM mynumbers)
SELECT dates, nums, previous1, (coalesce(nums,previous1)+lag(coalesce(nums,previous1), 5) OVER (ORDER BY dates ASC))*4 AS moving_calculation FROM mynumbers_lag;

The result starts to deviate from what I'd like it to be at 2019-03-01. I'd like my calculation to continue all the way through the table. Anyone know how I can accomplish this?

Edit: borrowing unutbu's table.. I want to yield this:

|      dates | nums | previous1 | moving_calculation |
|------------+------+-----------+--------------------|
| 2015-01-01 |    1 |           |                    |
| 2015-02-01 |    2 |         1 |                    |
| 2015-03-01 |    3 |         2 |                    |
| 2015-04-01 |    4 |         3 |                    |
| 2015-05-01 |    5 |         4 |                    |
| 2015-06-01 |    6 |         5 |                 28 |
| 2015-07-01 |    7 |         6 |                 36 |
| 2015-08-01 |    8 |         7 |                 44 |
| 2015-09-01 |    9 |         8 |                 52 |
| 2015-10-01 |   10 |         9 |                 60 |
...
| 2018-12-01 |   50 |        49 |                364 |
| 2019-01-01 |   50 |        49 |                372 |
| 2019-02-01 |   50 |        49 |                380 |
| 2019-03-01 |   50 |        49 |                388 |
| 2019-04-01 |   50 |        49 |               1744 | 
| 2019-05-01 |   50 |        49 |               7172 |
| 2019-06-01 |      |           |              28888 |
| 2019-07-01 |      |           |             117104 |
| 2019-08-01 |      |           |             475392 |
| 2019-09-01 |      |           |            1930256 |

On 2019-04-01, the 1744 is calculated from (388+48)*4. The 388 is one cell from the previously calculated value because nums is NULL. Eventually, starting on 2018-07-01, both nums are NULL, so it will calculating using only from moving_calculations (values 380 and 7172)


Solution

  • The values in the moving_calculation column (denoted m0 below) depend on prior values in the same column. They are defined by a recurrence relation. There might even be a closed-form formula for m0. You might want to ask a question on Mathematics stackexchange if you wish to find a closed-form formula. If we knew the closed-form formula, clearly computing values in Postgresql would be a breeze.

    However, if we regard this problem as a programming problem, then I believe the calculation -- if it is to be done in Postgresql -- is most easily expressed using WITH RECURSIVE. The calculation sort of feels like the calculation of Fibonacci numbers.

    WITH RECURSIVE r(a, b) AS (
      SELECT 0::int, 1::int
      UNION ALL
      SELECT b, a + b FROM r WHERE b < 50
    )
    SELECT a, b FROM r;
    

    yields

    |  a |  b |
    |----+----|
    |  0 |  1 |
    |  1 |  1 |
    |  1 |  2 |
    |  2 |  3 |
    |  3 |  5 |
    |  5 |  8 |
    |  8 | 13 |
    | 13 | 21 |
    | 21 | 34 |
    | 34 | 55 |
    

    If you understand the use of WITH RECURSIVE in that Fibonacci example, then I believe you'll see the solution below is merely an extension of the same idea.

    WITH RECURSIVE r(dates, nums, prev, m0, m1, m2, m3) AS (
    SELECT * FROM (VALUES ('2019-02-01'::date, 50::numeric, 47::numeric, 388::numeric, NULL::numeric, NULL::numeric, NULL::numeric)) AS t1
        UNION ALL
        SELECT (dates + '1 month'::interval)::date
            , m0
            , coalesce(m3, prev+1)
            , (m0+coalesce(m3, prev+1))*4
            , m0
            , m1
            , m2
        FROM r
        WHERE dates <= '2020-01-01'
    )
    SELECT * FROM r
    

    yields

    |      dates |       nums |     prev |         m0 |         m1 |        m2 |        m3 |
    |------------+------------+----------+------------+------------+-----------+-----------|
    | 2019-02-01 |         50 |       47 |        388 |            |           |           |
    | 2019-03-01 |        388 |       48 |       1744 |        388 |           |           |
    | 2019-04-01 |       1744 |       49 |       7172 |       1744 |       388 |           |
    | 2019-05-01 |       7172 |       50 |      28888 |       7172 |      1744 |       388 |
    | 2019-06-01 |      28888 |      388 |     117104 |      28888 |      7172 |      1744 |
    | 2019-07-01 |     117104 |     1744 |     475392 |     117104 |     28888 |      7172 |
    | 2019-08-01 |     475392 |     7172 |    1930256 |     475392 |    117104 |     28888 |
    | 2019-09-01 |    1930256 |    28888 |    7836576 |    1930256 |    475392 |    117104 |
    | 2019-10-01 |    7836576 |   117104 |   31814720 |    7836576 |   1930256 |    475392 |
    | 2019-11-01 |   31814720 |   475392 |  129160448 |   31814720 |   7836576 |   1930256 |
    | 2019-12-01 |  129160448 |  1930256 |  524362816 |  129160448 |  31814720 |   7836576 |
    | 2020-01-01 |  524362816 |  7836576 | 2128797568 |  524362816 | 129160448 |  31814720 |
    | 2020-02-01 | 2128797568 | 31814720 | 8642449152 | 2128797568 | 524362816 | 129160448 |
    

    To combine this table with the original table, use UNION:

    WITH mytable AS (
        SELECT *, (nums+prev)*4 AS m0, NULL::numeric AS m1, NULL::numeric AS m2, NULL::numeric AS m3
        FROM (
            SELECT *
                , lag(nums, 3) OVER (ORDER BY dates ASC) AS prev 
            FROM (
            SELECT
                generate_series('2015-01-01', '2025-12-01', '1 month'::interval)::date AS dates,
                generate_series(1,50)::numeric AS nums) t
            ) t2
        WHERE nums IS NOT NULL
    ), last_row AS (
        SELECT * FROM mytable
        WHERE nums IS NOT NULL
        ORDER BY dates DESC
        LIMIT 1 
    )
    SELECT * FROM mytable
    UNION (
        WITH RECURSIVE r(dates, nums, prev, m0, m1, m2, m3) AS (
            SELECT * FROM last_row
            UNION ALL
            SELECT (dates + '1 month'::interval)::date
                , m0
                , coalesce(m3, prev+1)
                , (m0+coalesce(m3, prev+1))*4
                , m0
                , m1
                , m2
            FROM r
            WHERE dates <= '2020-01-01')
        SELECT * FROM r)
    ORDER BY dates
    

    which yields

    |      dates |       nums |     prev |         m0 |         m1 |        m2 |        m3 |
    |------------+------------+----------+------------+------------+-----------+-----------|
    | 2015-01-01 |          1 |          |            |            |           |           |
    | 2015-02-01 |          2 |          |            |            |           |           |
    | 2015-03-01 |          3 |          |            |            |           |           |
    | 2015-04-01 |          4 |        1 |         20 |            |           |           |
    | 2015-05-01 |          5 |        2 |         28 |            |           |           |
    | 2015-06-01 |          6 |        3 |         36 |            |           |           |
    | 2015-07-01 |          7 |        4 |         44 |            |           |           |
    | 2015-08-01 |          8 |        5 |         52 |            |           |           |
    | 2015-09-01 |          9 |        6 |         60 |            |           |           |
    | 2015-10-01 |         10 |        7 |         68 |            |           |           |
    | 2015-11-01 |         11 |        8 |         76 |            |           |           |
    | 2015-12-01 |         12 |        9 |         84 |            |           |           |
    | 2016-01-01 |         13 |       10 |         92 |            |           |           |
    | 2016-02-01 |         14 |       11 |        100 |            |           |           |
    | 2016-03-01 |         15 |       12 |        108 |            |           |           |
    | 2016-04-01 |         16 |       13 |        116 |            |           |           |
    | 2016-05-01 |         17 |       14 |        124 |            |           |           |
    | 2016-06-01 |         18 |       15 |        132 |            |           |           |
    | 2016-07-01 |         19 |       16 |        140 |            |           |           |
    | 2016-08-01 |         20 |       17 |        148 |            |           |           |
    | 2016-09-01 |         21 |       18 |        156 |            |           |           |
    | 2016-10-01 |         22 |       19 |        164 |            |           |           |
    | 2016-11-01 |         23 |       20 |        172 |            |           |           |
    | 2016-12-01 |         24 |       21 |        180 |            |           |           |
    | 2017-01-01 |         25 |       22 |        188 |            |           |           |
    | 2017-02-01 |         26 |       23 |        196 |            |           |           |
    | 2017-03-01 |         27 |       24 |        204 |            |           |           |
    | 2017-04-01 |         28 |       25 |        212 |            |           |           |
    | 2017-05-01 |         29 |       26 |        220 |            |           |           |
    | 2017-06-01 |         30 |       27 |        228 |            |           |           |
    | 2017-07-01 |         31 |       28 |        236 |            |           |           |
    | 2017-08-01 |         32 |       29 |        244 |            |           |           |
    | 2017-09-01 |         33 |       30 |        252 |            |           |           |
    | 2017-10-01 |         34 |       31 |        260 |            |           |           |
    | 2017-11-01 |         35 |       32 |        268 |            |           |           |
    | 2017-12-01 |         36 |       33 |        276 |            |           |           |
    | 2018-01-01 |         37 |       34 |        284 |            |           |           |
    | 2018-02-01 |         38 |       35 |        292 |            |           |           |
    | 2018-03-01 |         39 |       36 |        300 |            |           |           |
    | 2018-04-01 |         40 |       37 |        308 |            |           |           |
    | 2018-05-01 |         41 |       38 |        316 |            |           |           |
    | 2018-06-01 |         42 |       39 |        324 |            |           |           |
    | 2018-07-01 |         43 |       40 |        332 |            |           |           |
    | 2018-08-01 |         44 |       41 |        340 |            |           |           |
    | 2018-09-01 |         45 |       42 |        348 |            |           |           |
    | 2018-10-01 |         46 |       43 |        356 |            |           |           |
    | 2018-11-01 |         47 |       44 |        364 |            |           |           |
    | 2018-12-01 |         48 |       45 |        372 |            |           |           |
    | 2019-01-01 |         49 |       46 |        380 |            |           |           |
    | 2019-02-01 |         50 |       47 |        388 |            |           |           |
    | 2019-03-01 |        388 |       48 |       1744 |        388 |           |           |
    | 2019-04-01 |       1744 |       49 |       7172 |       1744 |       388 |           |
    | 2019-05-01 |       7172 |       50 |      28888 |       7172 |      1744 |       388 |
    | 2019-06-01 |      28888 |      388 |     117104 |      28888 |      7172 |      1744 |
    | 2019-07-01 |     117104 |     1744 |     475392 |     117104 |     28888 |      7172 |
    | 2019-08-01 |     475392 |     7172 |    1930256 |     475392 |    117104 |     28888 |
    | 2019-09-01 |    1930256 |    28888 |    7836576 |    1930256 |    475392 |    117104 |
    | 2019-10-01 |    7836576 |   117104 |   31814720 |    7836576 |   1930256 |    475392 |
    | 2019-11-01 |   31814720 |   475392 |  129160448 |   31814720 |   7836576 |   1930256 |
    | 2019-12-01 |  129160448 |  1930256 |  524362816 |  129160448 |  31814720 |   7836576 |
    | 2020-01-01 |  524362816 |  7836576 | 2128797568 |  524362816 | 129160448 |  31814720 |
    | 2020-02-01 | 2128797568 | 31814720 | 8642449152 | 2128797568 | 524362816 | 129160448 |