Search code examples
oracledatesql-order-bywindow-functionsrecursive-query

Oracle values change when row_number() added


I have an Oracle query that uses DUAL to produce a list of dates in a subquery, and a case when to identify business days:

    SELECT DATES
        ,case when to_char(DATES, 'd') in (1,7)
         then 0
        else 1 end as business_day
    FROM (
        SELECT to_date('1/1/2020','MM/DD/YYYY') + (LEVEL -1) AS DATES
        FROM DUAL connect by level <=(to_date('1/1/2021','MM/DD/YYYY') - to_date('1/1/2020','MM/DD/YYYY'))
    ) L1

So far so good. Now when I nest this in a subquery, and add a row_number() function, all my business_day values become 0. If I remove the row_number() function, business_day goes back to normal.

SELECT L2.DATES
, L2.business_day
, row_number() OVER (PARTITION BY L2.business_day ORDER BY L2.DATES ASC) as dateindex
FROM (
    SELECT DATES
        ,case when to_char(DATES, 'd') in (1,7)
         then 0
        else 1 end as business_day
    FROM (
        SELECT to_date('1/1/2020','MM/DD/YYYY') + (LEVEL -1) AS DATES
        FROM DUAL connect by level <=(to_date('1/1/2021','MM/DD/YYYY') - to_date('1/1/2020','MM/DD/YYYY'))
    ) L1
) L2

Any idea how adding a new column causes another's values to change?


Solution

  • I suspect you aren't paying attention to the actual dates; running your code in this db<>fiddle, the first query returns:

    DATES     | BUSINESS_DAY
    :-------- | -----------:
    01-JAN-20 |            1
    02-JAN-20 |            1
    03-JAN-20 |            1
    04-JAN-20 |            1
    05-JAN-20 |            0
    06-JAN-20 |            0
    07-JAN-20 |            1
    08-JAN-20 |            1
    09-JAN-20 |            1
    10-JAN-20 |            1
    11-JAN-20 |            1
    ...
    

    while the second returns:

    DATES     | BUSINESS_DAY | DATEINDEX
    :-------- | -----------: | --------:
    05-JAN-20 |            0 |         1
    06-JAN-20 |            0 |         2
    12-JAN-20 |            0 |         3
    13-JAN-20 |            0 |         4
    19-JAN-20 |            0 |         5
    20-JAN-20 |            0 |         6
    26-JAN-20 |            0 |         7
    27-JAN-20 |            0 |         8
    02-FEB-20 |            0 |         9
    03-FEB-20 |            0 |        10
    ...
    

    All the business_day values are indeed zero... or at least, if you only look at the start of the result set. If you look further down:

    ...
    27-DEC-20 |            0 |       103
    28-DEC-20 |            0 |       104
    01-JAN-20 |            1 |         1
    02-JAN-20 |            1 |         2
    03-JAN-20 |            1 |         3
    04-JAN-20 |            1 |         4
    ...
    

    You don't have an order-by clause, and the analytic processing internally happens to return in an order you aren't expecting. If you add an order-by then it looks more sensible, as in this db<>fiddle:

    DATES     | BUSINESS_DAY | DATEINDEX
    :-------- | -----------: | --------:
    01-JAN-20 |            1 |         1
    02-JAN-20 |            1 |         2
    03-JAN-20 |            1 |         3
    04-JAN-20 |            1 |         4
    05-JAN-20 |            0 |         1
    06-JAN-20 |            0 |         2
    07-JAN-20 |            1 |         5
    08-JAN-20 |            1 |         6
    09-JAN-20 |            1 |         7
    10-JAN-20 |            1 |         8
    11-JAN-20 |            1 |         9
    12-JAN-20 |            0 |         3
    ...
    

    Incidentally, the 'd' format element is NLS-sensitive, so someone else running this code in a session with different settings could see different results. It would safer to do:

    when to_char(DATES, 'Dy', 'NLS_DATE_LANGUAGE=ENGLISH') in ('Sat', 'Sun')