I'm trying to find ways to reduce the time required to execute this query. Right now, it takes anywhere from 30-90 minutes depending on the server. I'm still learning about indexes so that's an option for future projects, and I'm also looking for suggestions that might help optimize the query itself. The databases are pretty large, containing anywhere from 10-100 million rows in the main table. Any suggestions would be helpful!
(I've changed column names to be more descriptive for this post)
USE MyDB
DECLARE @interval int = 6
DECLARE @mindate DATETIME = (SELECT DATEADD(HOUR, DATEDIFF(HOUR,0,MIN(DateColumn))/@interval*@interval, 0) FROM MyTable1)
DECLARE @maxdate DATETIME = (SELECT DATEADD(HOUR, DATEDIFF(HOUR,0,MAX(DateColumn))/@interval*@interval, 0) FROM MyTable1)
DECLARE @end DATETIME = (SELECT DATEADD(HOUR, DATEDIFF(HOUR,0,GETDATE())/@interval*@interval, 0))
--=========================================================
-- Update all DATEADD statements to desired interval
--=========================================================
;WITH cte
AS (SELECT
[StartTime] = CONVERT(datetime,@mindate),
[EndTime] = DATEADD(HOUR, @interval, CONVERT(datetime,@mindate))
UNION ALL
SELECT
DATEADD(HOUR, @interval, [StartTime]),
DATEADD(HOUR, @interval, [EndTime])
FROM cte
WHERE [EndTime] < @end)
--==========================================================
SELECT
[ServerName] = ServerName,
[StartTime] = CONVERT(varchar, [StartTime], 121),
[EndTime] = CONVERT(varchar, [EndTime], 121),
[ItemsMigrated] = COUNT(ItemIDColumn),
[SizeMigrated] = ISNULL(SUM(ItemSizeColumn),0),
[ItemsFailed] = (SELECT COUNT(*) FROM MyTable2 WHERE ItemStatusColumn = "Failed")
FROM cte
LEFT JOIN MyTable1 cr ON cr.DateColumn >= [StartTime] AND cr.DateColumn < [EndTime]
GROUP BY cte.StartTime, cte.EndTime
ORDER BY cte.StartTime
OPTION (MAXRECURSION 0)
Remove a select statement that iterates over the entire dataset:
DECLARE @mindate DATETIME = (SELECT DATEADD(HOUR, DATEDIFF(HOUR,0,MIN(DateColumn))/@interval*@interval, 0) FROM MyTable1)
DECLARE @maxdate DATETIME = (SELECT DATEADD(HOUR, DATEDIFF(HOUR,0,MAX(DateColumn))/@interval*@interval, 0) FROM MyTable1)
can be
DECLARE @mindate DATETIME
DECLARE @maxdate DATETIME
SELECT @mindate = DATEADD(HOUR, DATEDIFF(HOUR,0,MIN(DateColumn))/@interval*@interval, 0),
@maxdate = DATEADD(HOUR, DATEDIFF(HOUR,0,MAX(DateColumn))/@interval*@interval, 0)
FROM MyTable1
In addition, the casting a datetime to a datetime will have no effect, the compiler will ignore that that but you are casating a datetime to a varchar for every row. This will add up with 100 million rows. Instead do it once since these datetimes don't change.
SELECT
[StartTime] = CONVERT(datetime,@mindate),
[EndTime] = DATEADD(HOUR, @interval, CONVERT(datetime,@mindate))
[StartTimeS] = CONVERT(varchar, @mindate, 121),
[EndTimeS] = CONVERT(varchar, DATEADD(HOUR, @interval, CONVERT(datetime,@mindate)), 121),
[ItemsFailed] = (SELECT COUNT(*) FROM MyTable2 WHERE ItemStatusColumn = "Failed")
UNION ALL
SELECT
DATEADD(HOUR, @interval, [StartTime]),
DATEADD(HOUR, @interval, [EndTime])
CONVERT(varchar, DATEADD(HOUR, @interval, [StartTime]), 121),
CONVERT(varchar, DATEADD(HOUR, @interval, [EndTime]), 121),
[ItemsFailed]
FROM cte
Then you have
SELECT
[ServerName] = ServerName,
[StartTime] = StartTimeS,
[EndTime] = EndTimeS,
[ItemsMigrated] = COUNT(ItemIDColumn),
[SizeMigrated] = ISNULL(SUM(ItemSizeColumn),0),
[ItemsFailed]
But remember, 100m rows is still 100m rows, it will take some time. If you are doing any joining you will need an index not to have a crazy performance hit.
(For example a join to a 10 row table without an index will cause 1 billion additional reads on your DB.)
Make some indexes if you have 100 million rows.