Search code examples
sqlsql-serverquery-performance

Why is query with phone = N'1234' slower than phone = '1234'?


I have a field which is a varchar(20)

When this query is executed, it is fast (Uses index seek):

SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'

But this one is slow (uses index scan).

SELECT * FROM [dbo].[phone] WHERE phone = N'5554474477'

I am guessing that if I change the field to an nvarchar, then it would use the Index Seek.


Solution

  • Because nvarchar has higher datatype precedence than varchar so it needs to perform an implicit cast of the column to nvarchar and this prevents an index seek.

    Under some collations it is able to still use a seek and just push the cast into a residual predicate against the rows matched by the seek (rather than needing to do this for every row in the entire table via a scan) but presumably you aren't using such a collation.

    The effect of collation on this is illustrated below. When using the SQL collation you get a scan, for the Windows collation it calls the internal function GetRangeThroughConvert and is able to convert it into a seek.

    CREATE TABLE [dbo].[phone]
      (
         phone1 VARCHAR(500) COLLATE sql_latin1_general_cp1_ci_as CONSTRAINT uq1 UNIQUE,
         phone2 VARCHAR(500) COLLATE latin1_general_ci_as CONSTRAINT uq2 UNIQUE,
      );
    
    SELECT phone1 FROM [dbo].[phone] WHERE phone1 = N'5554474477';
    SELECT phone2 FROM [dbo].[phone] WHERE phone2 = N'5554474477';
    

    enter image description here

    The SHOWPLAN_TEXT is below

    Query 1

      |--Index Scan(OBJECT:([tempdb].[dbo].[phone].[uq1]),  WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone1],0)=CONVERT_IMPLICIT(nvarchar(4000),[@1],0)))
    

    Query 2

      |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], [Expr1006], [Expr1004]))
           |--Compute Scalar(DEFINE:(([Expr1005],[Expr1006],[Expr1004])=GetRangeThroughConvert([@1],[@1],(62))))
           |    |--Constant Scan
           |--Index Seek(OBJECT:([tempdb].[dbo].[phone].[uq2]), SEEK:([tempdb].[dbo].[phone].[phone2] > [Expr1005] AND [tempdb].[dbo].[phone].[phone2] < [Expr1006]),  WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone2],0)=[@1]) ORDERED FORWARD)
    

    In the second case the compute scalar emits the following values

    Expr1004 = 62
    Expr1005 = '5554474477'
    Expr1006 = '5554474478'
    

    the seek predicate shown in the plan is on phone2 > Expr1005 and phone2 < Expr1006 so on the face of it would exclude '5554474477' but the flag 62 means that this does match.

    NB: The range that the seek will cover will depend on the length of the string you are performing the equality on here.

    For the predicate = N'a' for example the seek will still read the entire range of index values beginning with an a and have a residual predicate preserving only those matching = 'a'. The predicate = N'' is even worse. With the zero length prefix it ends up reading the whole index.