Search code examples
sqlsql-serverleft-joinsql-optimization

Left join not being optimized


On a SQL Server database, consider a classical parent-child relation like the following:

create table Parent(
p_id uniqueidentifier primary key,
p_col1 int,
p_col2 int
);

create table Child(
c_id uniqueidentifier primary key,
c_p uniqueidentifier foreign key references Parent(p_id)
);

declare @Id int
set @Id = 1
while @Id <= 10000
begin
insert into Parent(p_id, p_col1, p_col2) values (NEWID(), @Id, @Id);
set @Id=@Id+1;
end
insert into Child(c_id, c_p) select NEWID(), p_id from Parent;
insert into Child(c_id, c_p) select NEWID(), p_id from Parent;
insert into Child(c_id, c_p) select NEWID(), p_id from Parent;
;

Now I have these two equivalent queries, one using inner and the other using left join:

Inner query:

select *
from Child c
inner join Parent p
on p.p_id=c.c_p
where p.p_col1=1 or p.p_col2=2;

Left Join query:

select *
from Child c
left join Parent p
on p.p_id=c.c_p
where p.p_col1=1 or p.p_col2=2;

I thought that the sql optimizer would be smart enough to figure out the same execution plan for these two query, but it's not the case. The plan for the inner query is this: enter image description here

The plan for the left join query is this: enter image description here

The optimizer works nice, chosing the same plan, if I have only one condition like:

where p.p_col1=1 

But if I add an "or" on a second different column then it doesn't chose the best plan anymore:

where p.p_col1=1 or p.p_col2=2;

Am I missing something or it is just the optimizer that is missing this improvement?


Solution

  • Clearly, it is the optimizer.

    When you have one condition in the WHERE clause (and "condition" could be a condition connected with ANDs, but not ORs), then the optimizer can easily peak and say "yes, the condition has rows from the second table, there is no NULL value comparison, so this is really an inner join".

    That logic gets harder when the conditions are connected by OR. I think you have observed that the optimizer does not do this for more complex conditions.