Search code examples
sqlsortingt-sqlrow-numberdense-rank

Group manually entered date values depending on whether they are continuously the same over system log dates - Follow-Up Question


This is a follow-up question to my initial post

Example Situation: An order system tracks manually entered due dates by recording a system log date that is always unique (this would be a datetime, but I've used dates for simplicity, making each unique).

I would like to assign a section number to each due date grouping where the due date remains the same chronologically.

Stu's response solved the table in my initial post, but I've noticed that if I replace the 4/15/2022 due date associated with SysLogDate of 1/16/2022 to be 4/13/2022, the desired ordering does not seem to be maintained:

Note: 4/13/2022 date is an arbitrary change. The same issue occurs if I use any other unique date that is not yet already in the DueDate column. Ultimately, I also need to be able to handle changes to/from NULL, where someone 'forgets' to enter the date, but replacing the date with NULL also yields the same issue.

Updated Table:

CREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022',   '1/10/2022')
                            ,(1, '4/10/2022',   '1/11/2022')
                            ,(1, '4/15/2022',   '1/15/2022')
                            ,(1, '4/13/2022',   '1/16/2022')  -- Due Date Altered since prior post
                            ,(1, '4/15/2022',   '1/17/2022')
                            ,(1, '4/10/2022',   '1/18/2022')
                            ,(1, '4/10/2022',   '1/19/2022')
                            ,(1, '4/10/2022',   '1/20/2022')
                            ,(2, '4/10/2022',   '2/16/2022')
                            ,(2, '4/10/2022',   '2/17/2022')
                            ,(2, '4/15/2022',   '2/18/2022')
                            ,(2, '4/15/2022',   '2/20/2022')
                            ,(2, '4/15/2022',   '2/21/2022')
                            ,(2, '4/10/2022',   '2/22/2022')
                            ,(2, '4/10/2022',   '2/24/2022')
                            ,(2, '4/10/2022',   '2/26/2022') 

Desired Results Are:

    OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
    1       2022-04-10      2022-01-10      1
    1       2022-04-10      2022-01-11      1
    1       2022-04-15      2022-01-15      2
    1       2022-04-13      2022-01-16      3
    1       2022-04-15      2022-01-17      4
    1       2022-04-10      2022-01-18      5
    1       2022-04-10      2022-01-19      5
    1       2022-04-10      2022-01-20      5
    2       2022-04-10      2022-02-16      1
    2       2022-04-10      2022-02-17      1
    2       2022-04-15      2022-02-18      2
    2       2022-04-15      2022-02-20      2
    2       2022-04-15      2022-02-21      2
    2       2022-04-10      2022-02-22      3
    2       2022-04-10      2022-02-24      3
    2       2022-04-10      2022-02-26      3

...but applying the solution from my prior post to this updated table yields:

    OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
    1       2022-04-10      2022-01-10      1
    1       2022-04-10      2022-01-11      1
    1       2022-04-15      2022-01-15      2
    1       2022-04-13      2022-01-16      3 **
    1       2022-04-15      2022-01-17      3 **
    1       2022-04-10      2022-01-18      3 **
    1       2022-04-10      2022-01-19      3 **
    1       2022-04-10      2022-01-20      3 **
    2       2022-04-10      2022-02-16      1
    2       2022-04-10      2022-02-17      1
    2       2022-04-15      2022-02-18      2
    2       2022-04-15      2022-02-20      2
    2       2022-04-15      2022-02-21      2
    2       2022-04-10      2022-02-22      3
    2       2022-04-10      2022-02-24      3
    2       2022-04-10      2022-02-26      3

Here's a demo to work that uses the above updated table and the solution from my prior post, and shows the above non-desired results: Fiddle

Demo showing same effect when the date is replaced with NULL: Fiddle with NULL

Copy of the selected solution from my prior post (used in the above Fiddles):

select OrderNo, DueDate, SysLogDate, 
      dense_rank() over(partition by orderno order by gp) SectionNumber_WithinDueDate
    from (
      select *,
        Row_Number() over(partition by OrderNo order by SysLogDate)
        - Row_Number() over(partition by OrderNo, DueDate order by SysLogDate) gp
      from #DueDates
    )t
    order by OrderNo, SysLogDate;

It's a small change in the data, but I haven't been able to work out how to alter the 'Row_Number difference line' in the subquery to get the desired results.

Thank you for any advice you can offer here :)

Edit: In hindsight, I realized the the section number field would be more appropriately named "SectionNumber_WithinOrder" instead of "SectionNumber_WithinDueDate," but I've left things as-is to keep the names aligned with the code samples that were proposed.


Solution

  • For gap and island problem, I prefer to use lag() window function as it is easier to understand.

    Use lag() to compare previous row value and when changed, set a flag (value 1). Perform a cumulative sum on the flag and you get the grp. Use dense_rank() on the grp and it gives you your SectionNumber_WithinDueDate

    As you have NULL value, use ISNULL() to return a date value (99991231) for comparison

    select OrderNo, DueDate, SysLogDate, 
           SectionNumber_WithinDueDate = dense_rank() over (partition by OrderNo 
                                                                order by grp)
    from
    (
        select *, grp = sum(g) over (partition by OrderNo 
                                         order by SysLogDate)
        from   
        ( 
               select *,
                      g = case when isnull(DueDate, '99991231')
                               <>   isnull(lag(DueDate) over (partition by OrderNo 
                                                                  order by SysLogDate), '99991231')
                               then 1
                               else 0
                               end
               from   #DueDates
       ) d
    ) d
    order by OrderNo, SysLogDate;
    

    Fiddle on your sample data :

    fiddle 1

    fiddle 2