Search code examples
sql-servercommon-table-expressionsql-server-2017

Group records when times are sequential within a given minute interval


In SQL Server 2017, I am trying to group multiple records into a single record when teacherid and customerid are the same and when the scheduled time is immediately 50 minutes after the previous record, ie.

Table: Schedule

----------------------------------------------------------------
id  custormerid teacherid   schedule
----------------------------------------------------------------
571654  1085        46      2022-02-22 07:00:00.000
571657  1085        46      2022-02-25 07:00:00.000
571658  1085        46      2022-02-26 07:00:00.000
571659  1085        46      2022-02-26 07:50:00.000
571660  1085        46      2022-02-26 08:40:00.000
571666  1085        46      2022-02-26 10:20:00.000
571661  1085        46      2022-02-28 07:00:00.000
571662  1085        46      2022-02-28 07:50:00.000
571663  1085        11      2022-02-28 08:40:00.000
571664  1085        46      2022-02-24 07:00:00.000
571665  1085        46      2022-02-24 07:50:00.000

Desired result

    --------------------------------------------------------------------------
    custormerid     teacherid   schedule            Qty
    --------------------------------------------------------------------------
    1085        46      2022-02-22 07:00:00.000     1
    1085        46      2022-02-25 07:00:00.000     1
    1085        46      2022-02-26 07:00:00.000     3
    1085        46      2022-02-26 10:20:00.000     1
    1085        46      2022-02-28 07:00:00.000     2
    1085        11      2022-02-28 08:40:00.000     1
    1085        46      2022-02-24 07:00:00.000     2

DDL Script:

CREATE TABLE [dbo].[Schedule](
    [id] [int] NOT NULL,
    [custormerid] [int] NULL,
    [teacherid] [int] NULL,
    [schedule] [datetime] NULL) 
    

INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571654, 1085, 46, CAST(N'2022-02-22T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571657, 1085, 46, CAST(N'2022-02-25T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571658, 1085, 46, CAST(N'2022-02-26T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571659, 1085, 46, CAST(N'2022-02-26T07:50:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571660, 1085, 46, CAST(N'2022-02-26T08:40:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571661, 1085, 46, CAST(N'2022-02-28T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571662, 1085, 46, CAST(N'2022-02-28T07:50:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571663, 1085, 11, CAST(N'2022-02-28T08:40:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571664, 1085, 46, CAST(N'2022-02-24T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571665, 1085, 46, CAST(N'2022-02-24T07:50:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571666, 1085, 46, CAST(N'2022-02-24T10:20:00.000' AS DateTime))

I saw some examples using CETs, but I didn't really understand how to get to that result


Solution

  • I think you do not need any CTE to solve that problem. By using Group by you can do that =>

    SELECT [custormerid],[teacherid],MIN([schedule]) [schedule],
           COUNT(*) Qty
     FROM [dbo].[Schedule]
    GROUP BY [custormerid],[teacherid],CAST([schedule] AS DATE)
    

    Update: Now, if you want maintain the sequence of the output as you provided in the question then you can use CTE like below.

    WITH CTE AS
    (
        SELECT MAX([id]) MyID,[custormerid],[teacherid],MIN([schedule]) [schedule],
               COUNT(*) Qty
         FROM [dbo].[Schedule]
        GROUP BY [custormerid],[teacherid],CAST([schedule] AS DATE)
    ) 
    SELECT [custormerid],[teacherid],[schedule],Qty FROM CTE ORDER BY MyID
    

    Output:

    enter image description here

    Final Update: Now, I have used several techniques to achieve what asked for. You said in the comment that you need count them with the diff of 50 min. So, I did that with multiple CTE. The query is providing the perfect output I believe for small amount of data but I did not test is for large amount of data. Another thing is that, the performance of query might be slow for large amount of data. So, please try it and let me know =>

    WITH myCTE AS
    (
     SELECT *,
          --I have used DENSE_RANK for Ordering the rows by [schedule]
          DENSE_RANK() OVER(PARTITION BY [custormerid],[teacherid],CAST([schedule] AS DATE) ORDER BY ID,CAST([schedule] AS DATE)) DRNK
     FROM [dbo].[Schedule]
    ),
    CTE AS
    (
        --Finding the parentID using logic of 50 min and others
        SELECT *,
        CASE WHEN DATEADD(MINUTE, -50,  [schedule])<=LAG([schedule],1) OVER(ORDER BY (SELECT NULL)) 
                  AND [teacherid]=LAG([teacherid],1) OVER(ORDER BY (SELECT NULL))
                  AND [custormerid]=LAG([custormerid],1) OVER(ORDER BY (SELECT NULL))
                  THEN LAG(ID,1) OVER(ORDER BY (SELECT NULL)) 
                  ELSE NULL END AS parentid
        FROM myCTE
    ),
    myY AS
    (   
       --Grouping the items Using Tree concept
        SELECT CTE.*, id AS rootid FROM CTE
        WHERE parentid IS NULL
        UNION ALL
        SELECT C.*, P.rootid FROM myY AS P
        INNER JOIN CTE AS C ON P.id = C.parentid
    )
    SELECT
        MAX([custormerid]) [custormerid],
        MAX([teacherid]) [teacherid],
        MIN([schedule]) [schedule],
        COUNT(*) Qty
    FROM myY
    GROUP BY rootid