Search code examples
sqloraclewindow-functions

oracle sql question, row between preceding


I need help understanding the below part of an Oracle SQL query. What does BETWEEN 7 PRECEDING AND 7 PRECEDING do? I understand that a vehicle can have more than one repo date if repo-ed more than once per that account. However, this code is building repo dates 1-7 like this and I am not sure what this is doing exactly. If someone could please explain, much appreciated. Thank you

, MIN(D0AL1.CONFIRM_DATE) OVER (PARTITION BY D0AL2.ACCOUNT_NBR
                    ORDER BY D0AL1.ASSIGNMENT_DATE ROWS BETWEEN 7 PRECEDING AND 7 PRECEDING) AS repo_date1
            , MIN(D0AL1.CONFIRM_DATE) OVER (PARTITION BY D0AL2.ACCOUNT_NBR
                    ORDER BY D0AL1.ASSIGNMENT_DATE ROWS BETWEEN 6 PRECEDING AND 6 PRECEDING) AS repo_date2
            , MIN(D0AL1.CONFIRM_DATE) OVER (PARTITION BY D0AL2.ACCOUNT_NBR
                    ORDER BY D0AL1.ASSIGNMENT_DATE ROWS BETWEEN 5 PRECEDING AND 5 PRECEDING) AS repo_date3

Solution

  • From the documentation:

    Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window ...

    and

    ROWS | RANGE

    These keywords define for each row a window (a physical or logical set of rows) used for calculating the function result. The function is then applied to all the rows in the window. The window moves through the query result set or partition from top to bottom.

    • ROWS specifies the window in physical units (rows).

    • RANGE specifies the window as a logical offset.

    ...

    value_expr PRECEDING or value_expr FOLLOWING

    For RANGE or ROW`:

    • If value_expr FOLLOWING is the start point, then the end point must be value_expr FOLLOWING.

    • If value_expr PRECEDING is the end point, then the start point must be value_expr PRECEDING.

    Each of your clauses is using the same value_expr for both preceding and following, so the window is being limited to exactly 1 row; looking back 7 rows for the first, 6 rows for the second, etc.

    As a demo of what that generates:

    with t (id, dt) as (
      select level, date '2018-01-01' + (level * 3)
      from dual
      connect by level <= 10
    )
    select id, dt,
      min(id) over (order by dt rows between 7 preceding and 7 preceding) as a,
      min(id) over (order by dt rows between 6 preceding and 6 preceding) as b,
      min(id) over (order by dt rows between 5 preceding and 5 preceding) as c
    from t
    order by dt;
    
            ID DT                  A          B          C
    ---------- ---------- ---------- ---------- ----------
             1 2018-01-04                                 
             2 2018-01-07                                 
             3 2018-01-10                                 
             4 2018-01-13                                 
             5 2018-01-16                                 
             6 2018-01-19                                1
             7 2018-01-22                     1          2
             8 2018-01-25          1          2          3
             9 2018-01-28          2          3          4
            10 2018-01-31          3          4          5
    

    The generated columns a, b and c are looking back 7, 6 and 5 rows respectively to find the value to use. If there is no matching row that far back then the result is null.


    Also notice that the analytic clause is ordered by a date value, and those dates are not contiguous - but the IDs returned are. That's because it's looking at the rows in that order, not the actual values they contain. If you used a range window instead:

    select id, dt,
      min(id) over (order by dt range between 7 preceding and 7 preceding) as a,
      min(id) over (order by dt range between 6 preceding and 6 preceding) as b,
      min(id) over (order by dt range between 5 preceding and 5 preceding) as c
    from t
    order by dt;
    
            ID DT                  A          B          C
    ---------- ---------- ---------- ---------- ----------
             1 2018-01-04                                 
             2 2018-01-07                                 
             3 2018-01-10                     1           
             4 2018-01-13                     2           
             5 2018-01-16                     3           
             6 2018-01-19                     4           
             7 2018-01-22                     5           
             8 2018-01-25                     6           
             9 2018-01-28                     7           
            10 2018-01-31                     8           
    

    ... the results would be very different, and you'd only see IDs whose date values are 5, 6 and 7 days back - and as I made all the gaps 3 days in the CTE, there are only matches in result column b. None of the rows have anything 5 or 7 days before, they only have rows 6 days before (as a multipel of 3), and the first couple of rows still have no match that far back. If you extended this to 3-days-preceding you'd see matches for those too.