Search code examples
sql-servert-sqlgroupinggroup

Group manually entered date values depending on whether they are continuously the same over system log dates


Example Situation: An order system tracks manually entered due dates by recording a system log date that is always unique (this would be a datetime, but I've used dates for simplicity).
I would like to assign a group or section to each due date grouping that exists continuous, chronologically, without changing the due date.

For example, if April 10 is entered as due date, later changed to April 15, and then changed back to April 10 again, this would be 3 distinct groups/sections. In a more complex model, the sectioning would allow deeper investigation as to why due date changes were entered.

Here is a sample table that has 2 orders, each with a similar set of due date changes.

CREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022',   '1/10/2022')
                            ,(1, '4/10/2022',   '1/11/2022')
                            ,(1, '4/15/2022',   '1/15/2022')
                            ,(1, '4/15/2022',   '1/16/2022')
                            ,(1, '4/15/2022',   '1/17/2022')
                            ,(1, '4/10/2022',   '1/18/2022')
                            ,(1, '4/10/2022',   '1/19/2022')
                            ,(1, '4/10/2022',   '1/20/2022')
                            ,(2, '4/10/2022',   '2/16/2022')
                            ,(2, '4/10/2022',   '2/17/2022')
                            ,(2, '4/15/2022',   '2/18/2022')
                            ,(2, '4/15/2022',   '2/20/2022')
                            ,(2, '4/15/2022',   '2/21/2022')
                            ,(2, '4/10/2022',   '2/22/2022')
                            ,(2, '4/10/2022',   '2/24/2022')
                            ,(2, '4/10/2022',   '2/26/2022')

The script should be able to assign a section/group to the due dates of both orders at the same time, as follows:

OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
1       2022-04-10      2022-01-10      1
1       2022-04-10      2022-01-11      1
1       2022-04-15      2022-01-15      2
1       2022-04-15      2022-01-16      2
1       2022-04-15      2022-01-17      2
1       2022-04-10      2022-01-18      3
1       2022-04-10      2022-01-19      3
1       2022-04-10      2022-01-20      3
2       2022-04-10      2022-02-16      1
2       2022-04-10      2022-02-17      1
2       2022-04-15      2022-02-18      2
2       2022-04-15      2022-02-20      2
2       2022-04-15      2022-02-21      2
2       2022-04-10      2022-02-22      3
2       2022-04-10      2022-02-24      3
2       2022-04-10      2022-02-26      3

Below is my initial attempt, using Dense_Rank():

Select *, Dense_Rank() OVER (Partition By OrderNo, DueDate Order By SysLogDate) as SectionNumber_WithinDueDate
From #DueDates

However, it groups all 4/10 due dates together within each order, instead of grouping them distinctly as two separate groups as above.

Incorrect result example:

OrderNo DueDate     SysLogDate      SectionNumber_WithinDueDate
1       2022-04-10  2022-01-10      1
1       2022-04-10  2022-01-11      2
1       2022-04-10  2022-01-18      3
1       2022-04-10  2022-01-19      4
1       2022-04-10  2022-01-20      5
1       2022-04-15  2022-01-15      1
1       2022-04-15  2022-01-16      2
1       2022-04-15  2022-01-17      3

Thank you for any ideas you may have!


Solution

  • This essentially requires a gaps and islands solution, which typically is done by subtracting a partitioned sequence from a global sequence.

    If you don't actually need the section number to be numbered sequentially and just really need to identify the separate groups you can disregard the outer select below.

    select OrderNo, DueDate, SysLogDate, 
      dense_rank() over(partition by orderno order by gp) SectionNumber_WithinDueDate
    from (
      select *,
        Row_Number() over(partition by OrderNo order by SysLogDate)
        - Row_Number() over(partition by OrderNo, DueDate order by SysLogDate) gp
      from #DueDates
    )t
    order by OrderNo, SysLogDate;
    

    Demo Fiddle