Search code examples
sqlt-sqlquery-optimization

Order in multiple "on" conditions in INNER Join


My question is : is there any order of execution in ON clauses in a typical INNER JOIN?

My tables are filled in about a million entries each table, I need to join them easily and I want to optimize my query.

Here's my example:

SELECT *

FROM TableA
INNER JOIN TableB ON 

(
    [costly validations]
) 
AND

TableB.Date1 > '2016-01-01' AND
TableA.Date2 > '2016-01-01' AND
TableB.Date1 = TableA.Date2 AND
TableA.Field1 = TableB.Field1 AND
TableA.Field2 = TableB.Field2 AND
TableA.Field3 = TableB.Field3 AND
TableA.Field4 = TableB.Field18 

where the costly validations are:

(
    CAST(
        SUBSTRING(
            Convert(nvarchar(50), TableA.MoneyField1), 
            2, 
            (len(Convert(nvarchar(50), TableA.MoneyField1)) - 1)
        ) as money
    ) = TableB.MoneyField2
    AND 
    len(
        Convert(
            nvarchar(50), 
            CAST(
                SUBSTRING(
                    Convert(nvarchar(50), TableA.MoneyField1), 
                    2, 
                    (len(Convert(nvarchar(50), TableA.MoneyField1)) - 1)
                ) as money
            )
        )
    ) = len(TableB.MoneyField2)
)  

Validations ideas : We were removing a dollar sign ($) in our very old applications from some strings received by a third party application before converting the string into money. The third party application removed the $ sign and now we are removing a digit from our numbers (oops). The applications are fixed but our data is corrupted.

Question is: is the order in the ON clause has any effect on the performance or optimization of the query? For example, if I put the costly validation at the end of the ON clause, is there any changes (I don't see any on my side, I tried).

My assumption is No since T-SQL is smarter than me.

so this:

ON
(
    [costly validations]
) 
AND

TableB.Date1 > '2016-01-01' AND
TableA.Date2 > '2016-01-01' AND
TableB.Date1 = TableA.Date2 AND
TableA.Field1 = TableB.Field1 AND
TableA.Field2 = TableB.Field2 AND
TableA.Field3 = TableB.Field3 AND
TableA.Field4 = TableB.Field18 

versus this :

ON
TableB.Date1 > '2016-01-01' AND
TableA.Date2 > '2016-01-01' AND
TableB.Date1 = TableA.Date2 AND
TableA.Field1 = TableB.Field1 AND
TableA.Field2 = TableB.Field2 AND
TableA.Field3 = TableB.Field3 AND
TableA.Field4 = TableB.Field18 AND
(
    [costly validations]
) 

Solution

  • There is no guarantee on the order of evaluation. Both within a single ON clause and across multiple ON clauses and the WHERE clause.1

    All that is required by the standard is that certain clauses are evaluated in a particular logical order, with database systems free to re-order evaluation provided they produce results consistent with the logical order. SQL goes even further beyond that, in some cases though - leading to the situation that it's very difficult to write code with guaranteed guards (e.g. if you want to do a/b and want to avoid the division by zero error, you have to jump through hoops to ensure you will not ever see that error. You can't just write b!=0 and a/b = <something>)


    1This is unlike a lot of other languages in which their boolean AND and OR operators are defined to perform short-circuiting, and only evaluate their right-hand expression if the left-hand expression is insufficient to determine the entire result.