I have the following scenario
System | Subsystem & Filename | File Load Start Time | File Load End Time |
---|---|---|---|
Alpha | A1 transactiontxt | 2022-06-19 08:00:00 | 2022-06-19 08:00:02 |
Alpha | A2 userscsv | 2022-06-19 08:00:02 | 2022-06-19 08:00:05 |
Alpha | A2 employeescsv | 2022-06-19 08:00:05 | 2022-06-19 08:00:08 |
Alpha | A1 managerscsv | 2022-06-19 08:00:00 | 2022-06-19 08:00:02 |
Alpha | A3 customerscsv | 2022-06-19 08:00:01 | 2022-06-19 08:00:04 |
Gamma | A1 transactiontxt | 2022-06-19 10:00:48 | 2022-06-19 10:00:53 |
Gamma | A2 userscsv | 2022-06-19 10:00:53 | 2022-06-19 10:00:54 |
Gamma | A2 employeescsv | 2022-06-19 10:00:27 | 2022-06-19 10:00:30 |
Gamma | A1 managerscsv | 2022-06-19 10:00:11 | 2022-06-19 10:00:17 |
Gamma | A3 customerscsv | 2022-06-19 10:00:13 | 2022-06-19 10:00:14 |
I want to be able to group the summary statistics by System. The info needed is when the overall started (earliest time), when it ended (latest time), and the time it took for each subsystem to occur, in seconds. From the example above, the result should look as below:
System | Overall System Load Start Time | Overall System Load End Time | A1 Time Taken | A2 Time Taken | A3 Time Taken |
---|---|---|---|---|---|
Alpha | 2022-06-19 08:00:00 | 2022-06-19 08:00:08 | 00:00:02 | 00:00:06 | 00:00:03 |
Gamma | 2022-06-19 10:00:11 | 2022-06-19 10:00:54 | 00:00:11 | 00:00:04 | 00:00:01 |
I cannot find a way to do this in a query, I'm trying to do select subqueries in the select clause for each column, and at the end group by only System. But this is not possible because I'd have to use an aggregate function which is not being supported with case statements in subqueries in the select clause
My approach was something like
SELECT System,
min(StartTime) as 'File Load Start Time',
max(EndTime) as 'File Load End Time',
CASE WHEN SubSystem LIKE 'A1%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A1 Time Taken',
CASE WHEN SubSystem LIKE 'A2%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A2 Time Taken',
CASE WHEN SubSystem LIKE 'A3%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A3 Time Taken'
FROM TABLE GROUP BY SYSTEM
But this does not work because the case statements need to be in a group by clause as well, and I cannot aggregate them. Sum will not work because if the same exact time is used for two subsystems, I don't want it to be summed -- they were done at the same time.
An aggregation function cannot be employed inside another aggregation function SUM(... MIN(...), MAX(...))
. In order to simplify your query, you need to split the operations you carry out in:
WITH cte AS (
SELECT DISTINCT [System],
LEFT([Subsystem & Filename],
CHARINDEX(' ', [Subsystem & Filename])-1) AS [Subsystem],
[File Load Start Time] AS [StartTime],
[File Load End Time] AS [EndTime]
FROM tab
)
SELECT [System],
MIN([StartTime]) AS [Overall System Load Start Time],
MAX([EndTime]) AS [Overall System Load End Time],
SUM(CASE WHEN [Subsystem]='A1'
THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A1 Time Taken],
SUM(CASE WHEN [Subsystem]='A2'
THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A2 Time Taken],
SUM(CASE WHEN [Subsystem]='A3'
THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A3 Time Taken]
FROM cte
GROUP BY [System]
Check the demo here.