Search code examples
sql-servert-sqlperformance-testing

T-SQL Probe residual


I am testing probe residual, joining tables with mismatching datatypes to see how that hurts performance. In this test, I join two tables, once with mismatching datatypes (nvarchar versus varchar) and once with matching datatypes. The code I used:

SELECT REPLACE (CONVERT(VARCHAR(255), NEWID()), '-', '') AS ColA,
      ROW_NUMBER() OVER (ORDER BY ( SELECT 1)) AS RowNumb
INTO #Varchar
FROM sys.all_objects AS S
CROSS JOIN sys.all_objects AS S2

SELECT CAST (ColA AS NVARCHAR (255)) AS ColA,
       RowNumb
INTO #Nvarchar
FROM #Varchar

SELECT TOP 1000000 *
INTO #Subset
FROM #Varchar

CREATE INDEX i_VarcharColA ON #Varchar (ColA)
CREATE INDEX i_NvarcharColA ON #Nvarchar (ColA)
CREATE INDEX i_SubsetColA ON #Subset (ColA)

SET STATISTICS IO, TIME ON
SELECT COUNT(*) AS CountStar
FROM #Varchar AS V
INNER JOIN #Subset AS S ON V.ColA = S.ColA -- matching datatypes

SELECT COUNT(*) AS CountStar
FROM #Nvarchar AS V
INNER JOIN #Subset AS S ON V.ColA = S.ColA -- non-matching datatypes

There is a big difference in performance. To my absolute surprise however, using the mismatching datatypes actually performs way better.

I must be overlooking something, but any insight would be greatly appreciated.

Thanks a lot!


Solution

    1. Add OPTION (RECOMPILE, MAXDOP 1) for clear test
    2. Test with DBCC DROPCLEANBUFFERS (remove all buffers from the buffer pool)
    3. In my sql (2016) first query "faster"
    4. test without MAXDOP = 1:
     INNER JOIN matching type -> NL JOIN
     CPU time = 15251 ms,  elapsed time = 4848 ms.
    
     INNER JOIN non-matching type -> MERGE JOIN + SORT 
     CPU time = 15889 ms,  elapsed time = 9776 ms.
    
    1. test with MAXDOP = 1
    INNER JOIN matching type -> NL JOIN    
    CPU time = 8438 ms,  elapsed time = 9307 ms.
    
    INNER JOIN non-matching type -> NL JOIN + sort    
    CPU time = 8687 ms,  elapsed time = 10130 ms.
    

    Earlier versions may have a different plan, I so expect that non-matching type get HASH JOIN and more CPU.

    p/s Sql server tries to create a cheaper plan, saving resources at the expense of speed of execution