Search code examples
sql-serverwhere-clausedivide-by-zerocross-join

How does the outer WHERE clause affect the way nested query is executed?


Let's say I have a table lines

 b | a
-----------
 17  7000
 17  0
 18  6000
 18  0
 19  5000
 19  2500

I want to get positive values of a function: (a1 - a2) \ (b2 - b1) for all elements in cartesian product of lines with different b's. (If you are interested this will result in intersections of lines y1 = b1*x + a1 and y2 = b2*x + a2)

I wrote query1 for that cause

SELECT temp.point FROM
    (SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
     FROM lines AS l1
     CROSS JOIN lines AS l2
     WHERE l1.b != l2.b
    ) AS temp
WHERE temp.point > 0

It throws a "division by zero" error. I tried the same query without the WHERE clause (query2) and it works just fine

SELECT temp.point FROM
    (SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
     FROM lines AS l1
     CROSS JOIN lines AS l2
     WHERE l1.b != l2.b
    ) AS temp

as well as the variation with the defined SQL function (query3)

CREATE FUNCTION get_point(@a1 DECIMAL(18, 4), @a2 DECIMAL(18, 4), @b1 INT, @b2 INT)
RETURNS DECIMAL(18, 4)
WITH EXECUTE AS CALLER
AS
BEGIN
    RETURN (SELECT (@a1 - @a2) / (@b2 - @b1))
END
GO
SELECT temp.point FROM
    (SELECT DISTINCT dbo.get_point(l1.a, l2.a, l1.b, l2.b) AS point
     FROM lines AS l1
     CROSS JOIN lines AS l2
     WHERE l1.b != l2.b
    ) AS temp
WHERE temp.point > 0

I have an intuitive assumption that the outer SELECT shouldn't affect the way nested SELECT is executed (or at least shouldn't break it). Even if it is not true that wouldn't explain why query3 works when query1 doesn't.

Could someone explain the principle behind this? That would be much appreciated.


Solution

  • If you want to guarantee that the query will always work, you'd need to wrap your calculation in something like a case statement

    case when l2.b - l1.b = 0
         then null
         else (l1.a - l2.a) / (l2.b - l1.b)
     end
    

    Technically, the optimizer is perfectly free to evaluate conditions in whatever order it expects will be more efficient. The optimizer is free to evaluate the division before the where clause that filters out rows where the divisor would be 0. It is also free to evaluate the where clause first. Your different queries have different query plans which result in different behavior.

    Realistically, though, even though a particular query might have a "good" query plan today, there is no guarantee that the optimizer won't decide in a day, a month, or a year to change the query plan to something that would throw a division by 0 error. I suppose you could decide to use a bunch of hints/ plan guides to force a particular plan with a particular behavior to be used. But that tends to be the sort of thing that bites you in the hind quarters later. Wrapping the calculation in a case (or otherwise preventing the division by 0 error) will be much safer and easier to explain to the next developer.