Search code examples
sqlsql-serverjoinrow-numberpartition-by

SQL: How do you make a second grouping based off non-sequential dates?


I'm using SQL Server 2008 R2. I have table called Name with the following structure and sample data:

ID         Date      Name         DOD
10001      200911    Kevin H      06/17/2000
10001      200912    Kevin        06/20/2000
10001      201001    Kevin        06/20/2000
10001      201012    K            06/20/2000
10001      201101    K            06/20/2000
10001      201406    Kevin        06/20/2000

Notice that the Employee 10001 has been changing names 3 times and DODs once over time. What I am trying to do is to group by ID, Name, and DOD such that the data is consistent between dates. I also need to grab the min and max date for these groups and ensure that the dates are in sequential order. If the name or DOD changes and then changes back what it was previously, a new group would need to be created.

So, the output will look like this:

EmployeeID MinDate  MaxDate Name     DOD 
10001      200911   200911  Kevin H  06/17/2000
10001      200912   201001  Kevin    06/20/2000
10001      201012   201101  K        06/20/2000
10001      201406   201406  Kevin    06/20/2000 

The Name table is quite large so there will be instances where Data is consistent for 20 months, then inconsistent for 1 month, and then back to being consistent for 20 months.

Thank you in advance and please let me know if you need additional information.


Solution

  • You can use the difference of row numbers approach:

    select employeeid, min(date) as mindate, max(date) as maxdate,
           name, dod
    from (select t.*,
                 row_number() over (partition by employeeid order by date) as seqnum_d,
                 row_number() over (partition by employeeid, name, dod order by date) as seqnum_dnd
          from t
         ) t
    group by employeeid, (seqnum_d - seqnum_dnd);
    

    To see how this works, run the subquery. If you stare at the results for a while, you'll get why the difference of row numbers works in this case.