Search code examples
sqlsql-serversql-server-2008

Can SELECT expressions sometimes be evaluated for rows not matching WHERE clause?


I would like to know if it's possible for expressions that are part of the SELECT statement list to be evaluated for rows not matching the WHERE clause?

From the execution order documented here, it seems that the SELECT gets evaluated long after the WHERE, however I ran into a very weird problem with a real-life query similar to the query below.

To put you in context, in the example, the SomeOtherTable has a a_varchar column which always contains numerical values for the code 105, but may contain non-numerical values for other codes.

The query statement works:

    SELECT an_id, an_integer FROM SomeTable

    UNION ALL

    SELECT an_id, CAST(a_varchar AS int)
    FROM SomeOtherTable
    WHERE code = 105

The following query complains about being unable to cast a_varchar to int:

SELECT 1
FROM (
    SELECT an_id, an_integer FROM SomeTable

    UNION ALL

    SELECT an_id, CAST(a_varchar AS int)
    FROM SomeOtherTable
    WHERE code = 105
) i
INNER JOIN AnotherOne a
    ON a.an_id = i.an_id

And finally, the following query works:

SELECT 1
FROM (
    SELECT an_id, an_integer FROM SomeTable

    UNION ALL

    SELECT 
        an_id, 
        CASE code WHEN 105 THEN CAST(a_varchar AS int) ELSE NULL END
    FROM SomeOtherTable
    WHERE code = 105
) i
INNER JOIN AnotherOne a
    ON a.an_id = i.an_id

Therefore, the only explanation I could find was that with the JOIN, the query gets optimized differently in a way that CAST(a_varchar AS int) gets executed even if code <> 105.

The queries are run against SQL SERVER 2008.


Solution

  • Absolutely.

    The documentation that you reference has a section called Logical Processing Order of the SELECT statement. This is not the physical processing order. It explains how the query itself is interpreted. For instance, an alias defined in the select clause cannot be references in the where clause, because the where clause is logically processed first.

    In fact, SQL Server has the ability to optimize queries by doing various data transformation operations when it reads the data. This is a nice performance benefit, because the data is in memory, locally, and the operations can simply be done in place. However, the following can fail with a run-time error:

    select cast(a_varchar as int)
    from table t
    where a_varchar not like '%[^0-9]%';
    

    The filter is applied after the attempt at conversion, in the real process flow. I happen to consider this a bug; presumably, the folks at Microsoft do not think so, because they have not bothered to fix this.

    Two workarounds are available. The first is try_convert(), which does conversions and returns NULL for a failure instead of a run-time error. The second is the case expression:

    select (case when a_varchar not like '%[^0-9]%' then cast(a_varchar as int) end)
    from table t
    where a_varchar not like '%[^0-9]%';