Search code examples
sqldatepyspark

Flag the first 3 and last 2 working days in a calendar table


I have an auxiliary table in my database that contains all the days of the year with a column (is_working_day) that is a boolean flag indicating whether that day is a business day or not.

I would like to build a query on this table that allows me to mark the last 2 business days of a month and the first three sequentially (1, 2, 3, 4, 5). Is it possible?

The output should be something like:

day is_working_day flag
01/01/2023 0 0
02/01/2023 1 1
03/01/2023 1 2
04/01/2023 1 0
05/01/2023 1 0
... ... ...
29/01/2023 1 3
30/01/2023 1 4
31/01/2023 1 5
01/02/2023 1 1
02/02/2023 1 2

Day and is_working day already exists in my table, what i would like is some logic to create the "flag" column.


Solution

  • If SQL-only solution is acceptable then you can use window functions to generate the flag column. One example:

    with cte as (
        select *
             , sum(is_working_day) over (partition by datepart(year, day), datepart(month, day) order by day) as rn
             , sum(is_working_day) over (partition by datepart(year, day), datepart(month, day)) as mv
        from t
    )
    select day
         , is_working_day
         , case
               when is_working_day = 1 and rn >= 1 and rn <= 2 then rn
               when is_working_day = 1 and rn > mv - 3 then 5 - (mv - rn)
           end as flag
    from cte
    

    Result:

    day is_working_day flag
    2023-01-01 0 null
    2023-01-02 1 1
    2023-01-03 0 null
    2023-01-04 1 2
    2023-01-05 1 3
    2023-01-29 1 4
    2023-01-30 0 null
    2023-01-31 1 5
    2023-02-01 1 1
    2023-02-02 1 2

    Replace date part extraction functions with those available in your RDBMS.

    DB<>Fiddle