In SQL Server 2017, I am trying to group multiple records into a single record when teacherid and customerid are the same and when the scheduled time is immediately 50 minutes after the previous record, ie.
Table: Schedule
----------------------------------------------------------------
id custormerid teacherid schedule
----------------------------------------------------------------
571654 1085 46 2022-02-22 07:00:00.000
571657 1085 46 2022-02-25 07:00:00.000
571658 1085 46 2022-02-26 07:00:00.000
571659 1085 46 2022-02-26 07:50:00.000
571660 1085 46 2022-02-26 08:40:00.000
571666 1085 46 2022-02-26 10:20:00.000
571661 1085 46 2022-02-28 07:00:00.000
571662 1085 46 2022-02-28 07:50:00.000
571663 1085 11 2022-02-28 08:40:00.000
571664 1085 46 2022-02-24 07:00:00.000
571665 1085 46 2022-02-24 07:50:00.000
Desired result
--------------------------------------------------------------------------
custormerid teacherid schedule Qty
--------------------------------------------------------------------------
1085 46 2022-02-22 07:00:00.000 1
1085 46 2022-02-25 07:00:00.000 1
1085 46 2022-02-26 07:00:00.000 3
1085 46 2022-02-26 10:20:00.000 1
1085 46 2022-02-28 07:00:00.000 2
1085 11 2022-02-28 08:40:00.000 1
1085 46 2022-02-24 07:00:00.000 2
DDL Script:
CREATE TABLE [dbo].[Schedule](
[id] [int] NOT NULL,
[custormerid] [int] NULL,
[teacherid] [int] NULL,
[schedule] [datetime] NULL)
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571654, 1085, 46, CAST(N'2022-02-22T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571657, 1085, 46, CAST(N'2022-02-25T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571658, 1085, 46, CAST(N'2022-02-26T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571659, 1085, 46, CAST(N'2022-02-26T07:50:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571660, 1085, 46, CAST(N'2022-02-26T08:40:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571661, 1085, 46, CAST(N'2022-02-28T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571662, 1085, 46, CAST(N'2022-02-28T07:50:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571663, 1085, 11, CAST(N'2022-02-28T08:40:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571664, 1085, 46, CAST(N'2022-02-24T07:00:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571665, 1085, 46, CAST(N'2022-02-24T07:50:00.000' AS DateTime))
INSERT [dbo].[Schedule] ([id], [custormerid], [teacherid], [schedule]) VALUES (571666, 1085, 46, CAST(N'2022-02-24T10:20:00.000' AS DateTime))
I saw some examples using CETs, but I didn't really understand how to get to that result
I think you do not need any CTE
to solve that problem. By using Group by
you can do that =>
SELECT [custormerid],[teacherid],MIN([schedule]) [schedule],
COUNT(*) Qty
FROM [dbo].[Schedule]
GROUP BY [custormerid],[teacherid],CAST([schedule] AS DATE)
Update: Now, if you want maintain the sequence of the output as you provided in the question then you can use CTE
like below.
WITH CTE AS
(
SELECT MAX([id]) MyID,[custormerid],[teacherid],MIN([schedule]) [schedule],
COUNT(*) Qty
FROM [dbo].[Schedule]
GROUP BY [custormerid],[teacherid],CAST([schedule] AS DATE)
)
SELECT [custormerid],[teacherid],[schedule],Qty FROM CTE ORDER BY MyID
Output:
Final Update:
Now, I have used several techniques to achieve what asked for. You said in the comment that you need count them with the diff of 50 min. So, I did that with multiple CTE
. The query is providing the perfect output I believe for small amount of data but I did not test is for large amount of data. Another thing is that, the performance of query might be slow for large amount of data. So, please try it and let me know =>
WITH myCTE AS
(
SELECT *,
--I have used DENSE_RANK for Ordering the rows by [schedule]
DENSE_RANK() OVER(PARTITION BY [custormerid],[teacherid],CAST([schedule] AS DATE) ORDER BY ID,CAST([schedule] AS DATE)) DRNK
FROM [dbo].[Schedule]
),
CTE AS
(
--Finding the parentID using logic of 50 min and others
SELECT *,
CASE WHEN DATEADD(MINUTE, -50, [schedule])<=LAG([schedule],1) OVER(ORDER BY (SELECT NULL))
AND [teacherid]=LAG([teacherid],1) OVER(ORDER BY (SELECT NULL))
AND [custormerid]=LAG([custormerid],1) OVER(ORDER BY (SELECT NULL))
THEN LAG(ID,1) OVER(ORDER BY (SELECT NULL))
ELSE NULL END AS parentid
FROM myCTE
),
myY AS
(
--Grouping the items Using Tree concept
SELECT CTE.*, id AS rootid FROM CTE
WHERE parentid IS NULL
UNION ALL
SELECT C.*, P.rootid FROM myY AS P
INNER JOIN CTE AS C ON P.id = C.parentid
)
SELECT
MAX([custormerid]) [custormerid],
MAX([teacherid]) [teacherid],
MIN([schedule]) [schedule],
COUNT(*) Qty
FROM myY
GROUP BY rootid