Search code examples
sql-servert-sqlsql-server-2014sqlperformance

Using GUID string in JOIN condition messes up execution time


I recognized a strange behavior. Given the following query:

SELECT * FROM foo
JOIN bar ON bar.id = foo.bar_id AND bar.other_id = '00000000-0000-0000-0000-000000000000'

Execution time on my server: ~120 ms, Reads: ~125000

When I rewrite the query like this:

DECLARE @other_id uniqueidentifier = '00000000-0000-0000-0000-000000000000'
SELECT * FROM foo
JOIN bar ON bar.id = foo.bar_id AND bar.other_id = @other_id

Execution time: ~6ms, Reads: ~140

Can anyone explain why the first query is so much slower? I assume the GUID string is converted over and over again in the query, but I would expect SQL server to be smart enough to not do this.


Solution

  • I've managed to reproduce your problem. And Dan Guzman (+1) was on the right lines. Here's a demo of what's going on:

    Create a table and dataset with the below code:

    CREATE TABLE dbo.Test(ID INT IDENTITY (1,1),Val UNIQUEIDENTIFIER)
    GO
    INSERT INTO dbo.Test
    SELECT NEWID() FROM sys.columns
    GO 30
    INSERT INTO dbo.Test
    SELECT TOP 1000 Val FROM dbo.Test
    GO 30
    
    CREATE UNIQUE CLUSTERED INDEX idx ON dbo.Test(ID)
    CREATE NONCLUSTERED INDEX idx2 ON dbo.Test(Val)
    GO
    

    Now look at your statistics:

    DBCC SHOW_STATISTICS ('dbo.test',IDX2)
    

    In EQ_ROWS you will see whole numbers (a count of duplicates matching the Hi Key), In AVG_RANGE_ROWS you will see decimal averages (mean rows per distinct key)

    Run the next statement to identify a key that is or isn't duplicated.

    SELECT Val, COUNT(*) FROM dbo.Test GROUP BY Val HAVING COUNT(*) = 1
    SELECT Val, COUNT(*) FROM dbo.Test GROUP BY Val HAVING COUNT(*) > 1
    

    If you run the below code with a Non-Unique [Val] then the execution plans will be identical because SQL produces plans based on AVG_RANGE_ROWS.

    If you run the code with a Unique [Val] then the Plans are slightly different, the Estimated number of rows changes, the query with the Parameter will use AVG_RANGE_ROWS, the query with the hard coded value will use EQ_ROWS. This difference can send the optimiser into producing a completely different plan on more complicated environments.

    DECLARE @r UNIQUEIDENTIFIER = 'CE043987-62B5-4AA6-9BE7-0005F2B54A24' 
    
    SELECT * FROM dbo.Test WHERE Val = @r
    
    SELECT * FROM dbo.Test WHERE Val = 'CE043987-62B5-4AA6-9BE7-0005F2B54A24'