Search code examples
sql-serverranking

SQL rownumber by columngroup


Im trying to create a report of workactivities. Where the following conditions are set.

If one is absent on a subsequent date-range, then mark as same absence-group, else mark as new absence-group.

So, if I am absent on monday, tuesday, wednesday, then that is one group, marked by a 1.

If I am then at work on thursday, but absent again friday, than friday is marked with group2.

If i am still absent on the following monday then that is still group 2.

The example data provided, is the type of data I get from the shift-scheduling tables I have, apart from identifiers if a particular activity is absence or non-absence. There are of course more than one set of initials, on any given day, along with different types of activities and absenceactivities.

I've tried to supply a minimal working example of the input data and the desired outcome. Hopefully I can get some pointers on this.

USE Sandbox 
DROP TABLE Data /* Clean up after ourselves. */
CREATE TABLE Data ( /* Create table */
[Date] DATE,
Initials VARCHAR(10),
Activity VARCHAR(255),
ActivityType VARCHAR(255)
);

/* Insert data */
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-05','PersonA','Work','Work')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-06','PersonA','Work','Work')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-07','PersonA','AbsenceActivity','Absence')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-08','PersonA','AbsenceActivity','Absence')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-09','PersonA','AbsenceActivity','Absence')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-10',NULL,NULL,'NoShift')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-11',NULL,NULL,'NoShift')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-12','PersonA','AbsenceActivity','Absence')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-13','PersonA','Work','Work')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-14','PersonA','AbsenceActivity','Absence')
INSERT INTO [dbo].[Data] ([Date],[Initials],[Activity],[ActivityType]) VALUES ('2016-12-15','PersonA','AbsenceActivity','Absence')

This sort of gives me what I want, but I can't get the rownumber, or rank, or dense_rank to reset, when a new group is introduced. Desired outcome towards the bottom.

    SELECT  [Date]
           ,Initials
           ,activity
           ,ActivityType
           ,rank() OVER (PARTITION BY data.activitytype, Initials ORDER BY data.date,data.initials) rownumber
    FROM    data
    GROUP BY data.date, data.initials, activity,ActivityType
ORDER BY date

Desired outcome - Full work activity date range

SELECT '2016-12-05' as [Date]   ,'PersonA' AS [Initials]    ,'Work'             AS [Activity]   ,'Work'     AS [ActivityType]   ,'0' AS [identifier]    union all
SELECT '2016-12-06' as [Date]   ,'PersonA' AS [Initials]    ,'Work'             AS [Activity]   ,'Work'     AS [ActivityType]   ,'0' AS [identifier]    union all
SELECT '2016-12-07' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-08' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-09' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union ALL
SELECT '2016-12-12' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-13' as [Date]   ,'PersonA' AS [Initials]    ,'Work'             AS [Activity]   ,'Work'     AS [ActivityType]   ,'0' AS [identifier]    union all
SELECT '2016-12-14' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'2' AS [identifier]    union all
SELECT '2016-12-15' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'2' AS [identifier]    

Alternative outcome 1 Full date-range including weekens or non-working dates

SELECT '2016-12-05' as [Date]   ,'PersonA' AS [Initials]    ,'Work'             AS [Activity]   ,'Work'     AS [ActivityType]   ,'0' AS [identifier]    union all
SELECT '2016-12-06' as [Date]   ,'PersonA' AS [Initials]    ,'Work'             AS [Activity]   ,'Work'     AS [ActivityType]   ,'0' AS [identifier]    union all
SELECT '2016-12-07' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-08' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-09' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union ALL
SELECT '2016-12-10' as [Date]   ,NULL       AS [Initials]   ,'Weekend'  AS [Activity]   ,'noShift' AS [ActivityType]    ,NULL AS [identifier]   union ALL
SELECT '2016-12-11' as [Date]   ,NULL       AS [Initials]   ,'Weekend' AS [Activity]    ,'noShift' AS [ActivityType]    ,NULL AS [identifier]   union ALL
SELECT '2016-12-12' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-13' as [Date]   ,'PersonA' AS [Initials]    ,'Work'             AS [Activity]   ,'Work'     AS [ActivityType]   ,'0' AS [identifier]    union all
SELECT '2016-12-14' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'2' AS [identifier]    union all
SELECT '2016-12-15' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'2' AS [identifier]    

Alternative outcome 2 Only included dates where a set of initials has activities on

SELECT '2016-12-07' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-08' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-09' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-10' as [Date]   ,NULL AS [Initials] ,'Weekend'  AS [Activity]   ,'noShift' AS [ActivityType]    ,NULL AS [identifier]   union ALL
SELECT '2016-12-11' as [Date]   ,NULL AS [Initials] ,'Weekend' AS [Activity]    ,'noShift' AS [ActivityType]    ,NULL AS [identifier]   union ALL
SELECT '2016-12-12' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'1' AS [identifier]    union all
SELECT '2016-12-14' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'2' AS [identifier]    union all
SELECT '2016-12-15' as [Date]   ,'PersonA' AS [Initials]    ,'AbsenceActivity'  AS [Activity]   ,'Absence'  AS [ActivityType]   ,'2' AS [identifier]    

Solution

  • Using with (common table expression) and row_number() we can generate row numbers for partitions (Initials, activitytype) and (Initials).

    By comparing those, we can group consecutive blocks by activity and number them using dense_rank().

    To convert all 'Work' to 0 I just used a case expression in the final query.

    rextester: http://rextester.com/AYR27547

    with cte as (
      select  
            [Date]
          , Initials
          , activity
          , ActivityType
          , irn=row_number() over (
              partition by Initials 
              order by [date]
              )
          , atrn=row_number() over (
              partition by Initials, activitytype 
              order by [date]
              )
        from [data] d
        group by [Date], Initials, Activity, ActivityType
        )
    
    select 
          [date]
        , Initials
        , activity
        , ActivityType
        , ActivityGroup = case 
            when ActivityType='Work' 
              then 0 
            else dense_rank() over (
                partition by Initials, ActivityType 
                order by irn-atrn
                )
            end
      from cte
      order by [date]
    

    Results:

    +------------+----------+-----------------+--------------+---------------+
    |    date    | Initials |    activity     | ActivityType | ActivityGroup |
    +------------+----------+-----------------+--------------+---------------+
    | 2016-12-05 | PersonA  | Work            | Work         |             0 |
    | 2016-12-06 | PersonA  | Work            | Work         |             0 |
    | 2016-12-07 | PersonA  | AbsenceActivity | Absence      |             1 |
    | 2016-12-08 | PersonA  | AbsenceActivity | Absence      |             1 |
    | 2016-12-09 | PersonA  | AbsenceActivity | Absence      |             1 |
    | 2016-12-10 | NULL     | NULL            | NoShift      |             1 |
    | 2016-12-11 | NULL     | NULL            | NoShift      |             1 |
    | 2016-12-12 | PersonA  | AbsenceActivity | Absence      |             1 |
    | 2016-12-13 | PersonA  | Work            | Work         |             0 |
    | 2016-12-14 | PersonA  | AbsenceActivity | Absence      |             2 |
    | 2016-12-15 | PersonA  | AbsenceActivity | Absence      |             2 |
    +------------+----------+-----------------+--------------+---------------+