I came across a query that had a condition in the HAVING section of the query that I would have typically placed in the WHERE section. I preceded to move the condition to the WHERE section and to my surprise the code runs about 200%+ longer using WHERE instead of HAVING.
This seamed odd to me and I have yet to be able to find anything online that describes this.
Here is an example of how the query is laid out:
This query runs in 50 to 55 seconds consistently.
SELECT TB1.COL1
,TB1.COL2
,TB2.COL3
,TB3.COL4
,TB1.SOME_ID
FROM TABLE1 TB1
JOIN TABLE2 TB2
JOIN TABLE3 TB3
ON TB1.SOME_ID = TB2.SOME_ID
ON TB1.SOME_ID = TB3.SOME_ID
GROUP BY TB1.COL1, TB1.COL2, TB2.COL3, TB3.COL4, TB1.SOME_ID
HAVING TB1.SOME_ID = 9999999
However this query runs in 120 to 130 seconds consistently.
SELECT TB1.COL1
,TB1.COL2
,TB2.COL3
,TB3.COL4
,TB1.SOME_ID
FROM TABLE1 TB1
JOIN TABLE2 TB2
JOIN TABLE3 TB3
ON TB1.SOME_ID = TB2.SOME_ID
ON TB1.SOME_ID = TB3.SOME_ID
WHERE TB1.SOME_ID = 9999999
GROUP BY TB1.COL1, TB1.COL2, TB2.COL3, TB3.COL4, TB1.SOME_ID
I am confused as to why the HAVING would ever run quicker than the WHERE statement for this kind of condition. I mean its not an aggregate or anything so I would normally use WHERE in this case but testing has shown this to be slower...
Any thoughts?
UPDATE:
Here is the original query, Please note I did not write this query and it does work just fine. I am simply tying to understand why the HAVING is faster than using the same condition in WHERE:
SELECT TB1.COL1
,TABLE302.COL2
,TABLE314.COL3
,TABLE314.COL4
,TABLE312.COL5
,TABLE314.COL6
,TABLE302.COL7
,TABLE320.COL8
,TABLE320.COL9
,TABLE302.COL10
,TABLE302.COL11
,TABLE230.COL12
,TABLE100.COL13
,TABLE100.COL14
,TABLE104.COL15
,TABLE110.COL16
,TABLE230.COL17
,TABLE230.COL18
,TB1.COL19
,IIf([TABLE230.CD]>' '
And format(cast(TABLE230.DT as date),'yyyy-MM-dd') = format(cast('12/31/9999' as date),'yyyy-MM-dd'), TABLE230.AMT,TB1.O) AS [FA]
,TABLE230.COL20
,TABLE230.COL21
,format(cast(DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0) as date),'yyyy-MM-dd') AS [TODAY]
,TABLE104.COL22
,TABLE320.COL23
,TABLE180.COL24
,Count(TABLE100.COL14) AS [COUNT_COL14]
,IIf(format(cast(TABLE230.FDT as date),'yyyy-MM-dd') = format(cast('1/1/1901' as date),'yyyy-MM-dd')
And format(cast(TABLE230.DDT as date),'yyyy-MM-dd') = format(cast('1/1/1901' as date),'yyyy-MM-dd'),'NOT ACTIVE','ACTIVE') AS [ST]
,TABLE230.COL25
FROM TABLE1 TB1
RIGHT JOIN TABLE302
INNER JOIN TABLE114
INNER JOIN TABLE130
INNER JOIN TABLE110
INNER JOIN TABLE126
INNER JOIN TABLE104
INNER JOIN TABLE124
INNER JOIN TABLE400
INNER JOIN TABLE120
ON TABLE400.CYP = TABLE120.CYP
INNER JOIN TABLE100
ON TABLE120.PID = TABLE100.COL13
ON TABLE124.PID = TABLE100.COL13
INNER JOIN TABLE230
ON TABLE120.PID = TABLE230.PID
AND TABLE120.CYP = TABLE230.CYP
ON TABLE104.PID = TABLE230.PID
AND TABLE104.PID = TABLE124.PID
ON TABLE126.PID = TABLE104.PID
ON TABLE110.EID = TABLE230.EID
AND TABLE110.EID = TABLE126.EID
ON TABLE130.GCD = TABLE230.GCD
AND TABLE130.EID = TABLE110.TID
ON TABLE114.GCD = TABLE130.GCD
ON TABLE302.COL7 = TABLE230.LD
AND TABLE302.COL10 = TABLE400.HP
AND TABLE302.COL11 = TABLE400.SP
INNER JOIN TABLE180
INNER JOIN TABLE320
ON TABLE180.RN = TABLE320.COL23
ON TABLE302.COL7 = TABLE320.CID
INNER JOIN TABLE314
ON TABLE302.ZID = TABLE314.COL4
ON TB1.COL1 = TABLE230.GCD
LEFT JOIN TABLE312
ON TABLE314.COL4 = TABLE312.TID
GROUP BY TB1.COL1
,TABLE302.COL2
,TABLE314.COL3
,TABLE314.COL4
,TABLE312.COL5
,TABLE314.COL6
,TABLE302.COL7
,TABLE320.COL8
,TABLE320.COL9
,TABLE302.COL10
,TABLE302.COL11
,TABLE230.COL12
,TABLE100.COL13
,TABLE100.COL14
,TABLE104.COL15
,TABLE110.COL16
,TABLE230.COL17
,TABLE230.COL18
,TB1.COL19
,IIf([TABLE230.CD]>' '
And format(cast(TABLE230.DT as date),'yyyy-MM-dd') = format(cast('12/31/9999' as date),'yyyy-MM-dd'), TABLE230.AMT,TB1.O)
,TABLE230.COL20
,TABLE230.COL21
,format(cast(DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0) as date),'yyyy-MM-dd')
,TABLE104.COL22
,TABLE320.COL23
,TABLE180.COL24
,IIf(format(cast(TABLE230.FDT as date),'yyyy-MM-dd') = format(cast('1/1/1901' as date),'yyyy-MM-dd')
And format(cast(TABLE230.DDT as date),'yyyy-MM-dd') = format(cast('1/1/1901' as date),'yyyy-MM-dd'),'NOT ACTIVE','ACTIVE')
,TABLE230.COL25
,TABLE230.FDT
,TABLE230.MDT
,TABLE230.NCD
,TABLE114.GCD
,TABLE230.MDT
HAVING TABLE314.COL4 = 99999999 -- If I move this line to WHERE it runs 2x longer
and format(cast(TABLE230.MDT as date),'yyyy-MM-dd') > format(cast('12/31/2019' as date),'yyyy-MM-dd')
The execution plan appear to be different as well.
The difference is the execution plan. You would have to look at the execution plans for two two queries to spot the differences.
In my experience, the difference is often due to the ability to use an index for the GROUP BY
. The filtering in the WHERE
prevents the use of the index. However, that is not the case for your query because it is aggregating by columns from multiple tables.
Another possibility is that the filter removes relatively few records, but affects the execution plan of the JOIN
s. I suspect this is the cause of what you are seeing. You would need to look at the execution plan to see if the joins are the same.