Search code examples
sqlsql-serverjoininner-join

SQL Server 2016 : query performance with join and without join


I have 2 tables TABLE1 AND TABLE2.

  • TABLE1 has columns masterId, Id, col1, col2, category
  • TABLE2 has columns Id, col1, col2

TABLE2.Id is primary key and TABLE1.Id is foreign key.

TABLE1.masterId is primary key of TABLE1.

TABLE1 has 10 million rows with Id 1 to 10 million and first 10 rows having category = 1

TABLE2 has only 10 rows with Id 1 to 10.

Now I want col1 and col2 values with category=1 (either from TABLE1 OR TABLE2 because the values are same in both tables)

Which among below 2 queries gives output faster?

Solution1:

SELECT T1.col1, T1.col2 
FROM TABLE1 T1
WHERE T1.category = 1

Solution2:

SELECT T2.col1, T2.col2 
FROM TABLE2 T2 
INNER JOIN TABLE1 T1 ON T1.Id = T2.Id
WHERE T1.category = 1

Does Solution2 save Table Scan time on millions of rows of TABLE1.

Limitation is: In my real db scenario, I can make Table1.Id as non clustered index and Table1.category also non clustered index. I cannot make Table1.Id as clustered index because I actually have another auto increment column as primary key in my Table1 in real scenario. So please share your thoughts with this limitation.

Please confirm and share thoughts on this.


Solution

  • It depends on the existing indexes. With a nonclustered index on Id in T1, then the solution 2 might perform better than solution 1, that would require a complete table scan to select the rows with category1. If instead we also have a nonclustered index on Category, then the solution 1 will be faster, since it would only have to seek the nonclustered index to find the rows.

    Without any index on Id on T1 a full scan would be required to find the T2.Id row, therefore there might be 10 full scan of T1 for solution 2 and 1 full scan on T1.Category for solution 1, so the solution 1 might be faster. But this depends on the query optimizer and a test the real case to see what are the actual execution plans would be the best way to answer.

    But the way to go is to implement the right model and then proceed to create the indexes needed to make the query run fast.

    Edit: adapted the answer according to the query edits. Edit2: index coverage would be expensive and a 10 index seek on PK on table 1 would not cost so much.