Search code examples
sql-serverwindow

Creating rolling window for time series data in SQL


I have a question regarding adding rolling window column in SQL. Table A is a sample of 24 months time series data. I need to add column for difference between each month balances with pervious month and a month before pervious month. For example for Mar 2020 I need to have difference between Mar and Feb and also Mar and Jan for Deposit and Withdraw separately for each ID (Table B). I try to use 'window' function in sql but I do not know how.

 **Table A**
 ID      | Date      |A      | B       |
+--------+-----------+-------+---------
| 1      | Jan 20    | $200  | $100    |
| 1      | Feb 20    | $500  | $250    |
| 1      | Mar 20    | $1000 | $550    | 
+--------+-----------+-------+---------+

I want results like this:

 **Table B**
 ID      | Date      |A      | B       | A(Mar-Feb)| A(Mar-Jan)| B(Mar-Feb)| B(Mar-Jan)|
+--------+-----------+-------+------------------------------------------------------
| 1      | Jan 20    | $200  | $100    |            |           |          |           |
| 1      | Feb 20    | $500  | $250    |            |           |          |           |
| 1      | Mar 20    | $1000 | $550    | $500       |$800       |$300      |$450       |
+--------+-----------+-------+---------+------------+-----------+----------+-----------+

I'd really appreciated if someone can help me.


Solution

  • Edited: See edit at bottom for corrected answer based on more information from OP

    I "think" this is what you're asking for and it may not perfectly be what you want, because it fills in the other rows as well...

    IF OBJECT_ID('tempdb..#TableA','U') IS NOT NULL DROP TABLE #TableA; --SELECT * FROM #TableA
    CREATE TABLE #TableA (
        ID     int  NOT NULL,
        [Date] date NOT NULL,
        A      int  NOT NULL,
        B      int  NOT NULL,
    )
    
    INSERT INTO #TableA (ID, Date, A, B)
    VALUES (1, '2020-01-01',  200, 100)
        ,  (1, '2020-02-01',  500, 250)
        ,  (1, '2020-03-01', 1000, 550)
    
    SELECT ta.ID
        , [Date] = FORMAT(ta.[Date],'MMM yy')
        , ta.A, ta.B
        , A_DiffPrev  = ta.A - LAG(ta.A)         OVER (ORDER BY ta.[Date])
        , A_DiffFirst = ta.A - FIRST_VALUE(ta.A) OVER (ORDER BY ta.[Date])
        , B_DiffPrev  = ta.B - LAG(ta.B)         OVER (ORDER BY ta.[Date])
        , B_DiffFirst = ta.B - FIRST_VALUE(ta.B) OVER (ORDER BY ta.[Date])
    FROM #TableA ta
    

    Returns:

    | ID | Date   | A    | B   | A_DiffPrev | A_DiffFirst | B_DiffPrev | B_DiffFirst | 
    |----|--------|------|-----|------------|-------------|------------|-------------| 
    | 1  | Jan 20 | 200  | 100 | NULL       | 0           | NULL       | 0           | 
    | 1  | Feb 20 | 500  | 250 | 300        | 300         | 150        | 150         | 
    | 1  | Mar 20 | 1000 | 550 | 500        | 800         | 300        | 450         | 
    

    Explanation

    • LAG(ta.A) OVER (ORDER BY ta.[Date]) - This will give you the previous value as sorted by the provided ORDER BY. So in this case, it's saying, give me the value that occurs prior to the current row, if you sort by [Date] Ascending
    • FIRST_VALUE(ta.A) OVER (ORDER BY ta.[Date]) - Similar idea to LAG() except it's saying to get the very first item, rather than the previous item.

    Edit

    In the comments you mentioned that FIRST_VALUE() will not work for you because you don't want to compare with the first month, you want to compare with the previous month and two months back.

    In that case, you can use this solution:

    SELECT ta.ID
        , [Date] = FORMAT(ta.[Date],'MMM yy')
        , ta.A, ta.B
        , A_DiffPrev1 = ta.A - LAG(ta.A,1) OVER (ORDER BY ta.[Date])
        , A_DiffPrev2 = ta.A - LAG(ta.A,2) OVER (ORDER BY ta.[Date])
        , B_DiffPrev1 = ta.B - LAG(ta.B,1) OVER (ORDER BY ta.[Date])
        , B_DiffPrev2 = ta.B - LAG(ta.B,2) OVER (ORDER BY ta.[Date])
    FROM #TableA ta
    

    Explanation:

    In this change, I'm using LAG() for everything. But instead, I'm telling LAG() how many rows I want it to look back.

    So to get the previous month, I say LAG(A, 1) which means to grab the previous row, which is the default, I'm only providing it here to make it more explicitly clear what is happening.

    Then I say LAG(A, 2) which means to go back two rows and grab that value.

    NOTE: This is all assuming you do not have gaps in your data.