Search code examples
sqlperformancesql-server-2012execution-timedatabase-tuning

SQL Server Query Join taking very long when adding an additional column in the join


I'm running a basic query on a table with a unique ID in each table, which is a non-clustered index in each. One table has 8 million rows, and the other has 800,000 rows.

When I run the following, it returns 24,000 rows in less than a second:

select     
    a.[ID]      
    ,b.[ID]    
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 

However, when I add an additional column in the join which will significantly reduce the recordset, this takes about 8 minutes

select     
    a.[ID]      
    ,b.[ID]    
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 
    AND a.code_letter = b.code_letter

The "code_letter" column is just one letter, and is set to varchar(1). I'm scratching my head trying to understand why this is hanging. The issue is that I've run a dynamic sql insert query with 20,000 permutations of joins, and it's taking way too long.

Edit After trying this many ways, I realized that a simple select * seems to work very efficiently, while selecting specific columns is the culprit. Here is the execution plan when I select *:

select all query

Here is the execution plan when I select specific columns:

select specific columns

Again, my join is exactly the same, but the column selections are different.


Solution

  • OP said, he didn't get the expected results, and based on his observations, I'll provide a different solution.

    What I would do is, execute the following and get data to a temp table

    select     
        a.[ID] as aID      
        ,b.[ID] as bID
        ,a.code_letter as aCode_letter  
        ,b.code_letter as bCode_letter
    into #t
    from     
        dbo.tbl_1 a    
    join     
        dbo.tbl_2 b    
    on 
        a.unique_id = b.unique_id 
    

    and then exec the following

    Select aID, bID from #t Where aCode_letter = bCode_letter
    

    select DISTINCT a.*    
    into #ta
    from     
        dbo.tbl_1 a    
    join     
        dbo.tbl_2 b    
    on 
        a.unique_id = b.unique_id 
    
    select DISTINCT b.*    
    into #tb
    from     
        dbo.tbl_1 a    
    join     
        dbo.tbl_2 b    
    on 
        a.unique_id = b.unique_id 
    

    And exec

    Select a.ID, b.ID 
    from #ta a
    Inner Join #tb b 
    on a.unique_id = b.unique_id and a.Code_letter = b.Code_letter