I am testing probe residual, joining tables with mismatching datatypes to see how that hurts performance. In this test, I join two tables, once with mismatching datatypes (nvarchar versus varchar) and once with matching datatypes. The code I used:
SELECT REPLACE (CONVERT(VARCHAR(255), NEWID()), '-', '') AS ColA,
ROW_NUMBER() OVER (ORDER BY ( SELECT 1)) AS RowNumb
INTO #Varchar
FROM sys.all_objects AS S
CROSS JOIN sys.all_objects AS S2
SELECT CAST (ColA AS NVARCHAR (255)) AS ColA,
RowNumb
INTO #Nvarchar
FROM #Varchar
SELECT TOP 1000000 *
INTO #Subset
FROM #Varchar
CREATE INDEX i_VarcharColA ON #Varchar (ColA)
CREATE INDEX i_NvarcharColA ON #Nvarchar (ColA)
CREATE INDEX i_SubsetColA ON #Subset (ColA)
SET STATISTICS IO, TIME ON
SELECT COUNT(*) AS CountStar
FROM #Varchar AS V
INNER JOIN #Subset AS S ON V.ColA = S.ColA -- matching datatypes
SELECT COUNT(*) AS CountStar
FROM #Nvarchar AS V
INNER JOIN #Subset AS S ON V.ColA = S.ColA -- non-matching datatypes
There is a big difference in performance. To my absolute surprise however, using the mismatching datatypes actually performs way better.
I must be overlooking something, but any insight would be greatly appreciated.
Thanks a lot!
INNER JOIN matching type -> NL JOIN CPU time = 15251 ms, elapsed time = 4848 ms. INNER JOIN non-matching type -> MERGE JOIN + SORT CPU time = 15889 ms, elapsed time = 9776 ms.
INNER JOIN matching type -> NL JOIN CPU time = 8438 ms, elapsed time = 9307 ms. INNER JOIN non-matching type -> NL JOIN + sort CPU time = 8687 ms, elapsed time = 10130 ms.
Earlier versions may have a different plan, I so expect that non-matching type get HASH JOIN and more CPU.
p/s Sql server tries to create a cheaper plan, saving resources at the expense of speed of execution