Let's say I have a table lines
b | a
-----------
17 7000
17 0
18 6000
18 0
19 5000
19 2500
I want to get positive values of a function: (a1 - a2) \ (b2 - b1)
for all elements in cartesian product of lines with different b's. (If you are interested this will result in intersections of lines y1 = b1*x + a1
and y2 = b2*x + a2
)
I wrote query1 for that cause
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
It throws a "division by zero" error. I tried the same query without the WHERE clause (query2) and it works just fine
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
as well as the variation with the defined SQL function (query3)
CREATE FUNCTION get_point(@a1 DECIMAL(18, 4), @a2 DECIMAL(18, 4), @b1 INT, @b2 INT)
RETURNS DECIMAL(18, 4)
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN (SELECT (@a1 - @a2) / (@b2 - @b1))
END
GO
SELECT temp.point FROM
(SELECT DISTINCT dbo.get_point(l1.a, l2.a, l1.b, l2.b) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
I have an intuitive assumption that the outer SELECT shouldn't affect the way nested SELECT is executed (or at least shouldn't break it). Even if it is not true that wouldn't explain why query3 works when query1 doesn't.
Could someone explain the principle behind this? That would be much appreciated.
If you want to guarantee that the query will always work, you'd need to wrap your calculation in something like a case
statement
case when l2.b - l1.b = 0
then null
else (l1.a - l2.a) / (l2.b - l1.b)
end
Technically, the optimizer is perfectly free to evaluate conditions in whatever order it expects will be more efficient. The optimizer is free to evaluate the division before the where
clause that filters out rows where the divisor would be 0. It is also free to evaluate the where
clause first. Your different queries have different query plans which result in different behavior.
Realistically, though, even though a particular query might have a "good" query plan today, there is no guarantee that the optimizer won't decide in a day, a month, or a year to change the query plan to something that would throw a division by 0 error. I suppose you could decide to use a bunch of hints/ plan guides to force a particular plan with a particular behavior to be used. But that tends to be the sort of thing that bites you in the hind quarters later. Wrapping the calculation in a case
(or otherwise preventing the division by 0 error) will be much safer and easier to explain to the next developer.