sql db2 subquery query-optimization inner-join

Which query is the fastest--the one using inner join or the one using a subquery?

I have two queries giving me the same result. Which one is the most efficient?

The model:

Order {  
 orderId 
}

OrderItem {  
 orderItemId  
 orderId   
 productId 
}

Product {   
 productId 
}

The relationship between Order and OrderItem is one-to-many, and many OrderItems are related to one Product.

I'd like to retrieve the orders that are related to a specific product (in the requests the parameter :productId).

The first request with inner join:

Select distinct o
from Order o
inner join OrderItem oi
on o.id =
oi.orderId  and oi.productId = :productId

The second request with a count subquery in the where clause:

Select o 
from Order o
where 
    (Select count oi 
    from OrderItem oi 
    where oi.orderId = o.id and oi.productId = :productId
    ) > 0

I also use DB2 and Hibernate. And there is an index on each primary and foreign keys.

Solution

Your queries aren't valid. You must say SELECT column rather than SELECT table. I assume you want this.

Select distinct o.id    -- COLUMN NAME here.
from Order o
inner join OrderItem oi
on o.id =
oi.orderId  and oi.productId = :productId

You want something straightforward: a set of ids from one table that meet a condition in a joined table. You could probably get this with a very simple one-table query.

SELECT DISTINCT oi.orderId AS Id
FROM OrderItem oi
WHERE oi.productId = :productId

But, unless there's an enforced foreign-key constraint between oi.orderId and o.Id this might get you some extra orderIds: ones that exist in the OrderItem table but not in the Order table.

Your first example is a good way -- an ideal way, even -- to overcome that objection.

Your second example is not so good. It has a correlated subquery. That is, the subquery refers to a column in a table alias -- o.id -- from the outer query. That means it must potentially repeat the evaluation of the subquery for each row from the outer query. That cannot be faster than your first query, even with the smartest query planner in the galaxy.

Now, if instead you want all the columns of your Order table (SELECT o.*) matching the particular query in the second table, use this.

SELECT o.*
  FROM Order o
 WHERE o.Id IN (
          SELECT oi.orderId
            FROM OrderItem oi
           WHERE oi.productId = :productId
     )

That's the fastest way to retrieve entire rows from one table that match some criterion in another table. Notice that the IN () operator automatically handles duplicate values.

Pro tip Use SELECT DISTINCT sparingly. If you find yourself using it on more than one column, you probably need to try to understand why you might have duplicates.