Search code examples
sql-servert-sql

Using Subquery or INNER JOIN


Item 2 of the challenge 2 of this Lab from Microsoft SQL Server Team, is as follows:

Challenge 1: retrieve product price information

AdventureWorks products each have a standard cost price that indicates the cost of manufacturing the product, and a list price that indicates the recommended selling price for the product. This data is stored in the SalesLT.Product table. Whenever a product is ordered, the actual unit price at which it was sold is also recorded in the SalesLT.SalesOrderDetail table. You must use subqueries to compare the cost and list prices for each product with the unit prices charged in each sale.

  1. Retrieve products whose list price is higher than the average unit price.

    Retrieve the product ID, name, and list price for each product where the list price is higher than the average unit price for all products that have been sold.

  2. Retrieve Products with a list price of 100 or more that have been sold for less than 100.

    Retrieve the product ID, name, and list price for each product where the list price is 100 or more, and the product has been sold for less than 100.

A solution (provided on the same link above) for the above challenge, they have presented with the following query that returns 7 records:

 SELECT ProductID, Name, ListPrice
 FROM SalesLT.Product
 WHERE ProductID IN (SELECT ProductID
                     FROM SalesLT.SalesOrderDetail
                     WHERE UnitPrice < 100.00)
   AND ListPrice >= 100.00
 ORDER BY ProductID;
ProductID Name ListPrice
810 HL Mountain Handlebars 120.27
813 HL Road Handlebars 120.27
876 Hitch Rack - 4-Bike 120
894 Rear Derailleur 121.46
907 Rear Brakes 106.5
948 Front Brakes 106.5
996 HL Bottom Bracket 121.49

But, with the same search criteria, my own following query returns 27 records (shown below) where the list prices >=100 and UnitPrice < 100:

Question: Why the solution suggested by Microsoft Lab is correct, and what I am doing is not correct:

SELECT p.ProductID, p.Name, p.ListPrice, s.UnitPrice
FROM SalesLT.Product p
JOIN SalesLT.SalesOrderDetail s ON p.ProductID = s.ProductID  
WHERE p.ListPrice >= 100.00
  AND s.UnitPrice < 100.00
ORDER BY ProductID;
ProductID Name ListPrice UnitPrice
810 HL Mountain Handlebars 120.27 72.162
810 HL Mountain Handlebars 120.27 72.162
810 HL Mountain Handlebars 120.27 72.162
813 HL Road Handlebars 120.27 72.162
813 HL Road Handlebars 120.27 72.162
813 HL Road Handlebars 120.27 72.162
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
876 Hitch Rack - 4-Bike 120.00 72.00
894 Rear Derailleur 121.46 72.876
894 Rear Derailleur 121.46 72.876
907 Rear Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
948 Front Brakes 106.50 63.90
996 HL Bottom Bracket 121.49 72.894
996 HL Bottom Bracket 121.49 72.894
996 HL Bottom Bracket 121.49 72.894
996 HL Bottom Bracket 121.49 72.894

Solution

  • First, let's be a little pedantic and consider the difference between what the task asked from you and what you did instead. The task asked you to use subselect, with the main select querying SalesLT.Product and the subselect querying SalesLT.SalesOrderDetail. Instead of doing as the task instructed you, you did a join. You could technically achieve the goal of the task with a join too, but you need to follow the instructions and, if you are being asked to use a subquery, then do so.

    Second, the logical error that you committed is that when you join SalesLt.Product with SalesLT.SalesOrderDetail, then you join all matches, so if you have a product that was sold 8 times, then you will get 8 pairs and consequently you will have 8 records corresponding to that product instead of 1 (see ProductID 876 for example), which would have been the result if you followed the instruction and would have used a subquery. But, to achieve the same with a join, you can do a group by, that is, you merge your multiple product-sales tuples into singular records, exactly one record for each product that has a single match:

    SELECT p.ProductID, p.Name, p.ListPrice
    FROM SalesLT.Product p
    JOIN SalesLT.SalesOrderDetail s ON p.ProductID = s.ProductID  
    WHERE p.ListPrice >= 100.00
      AND s.UnitPrice < 100.00
    GROUP BY p.ProductID, p.Name, p.ListPrice
    ORDER BY ProductID;
    

    Third, your join and their subselect is not optimal, as they will search for all pairs instead of a single example. EXISTS search for a single match in the subselect and, if found a match, stops searching for a new one, whereas IN finds ALL matches and therefore takes more time and in our case we are only interested in EXISTS, so:

     SELECT ProductID, Name, ListPrice
     FROM SalesLT.Product
     WHERE EXISTS (SELECT SalesLT.SalesOrderDetail.ProductID
                   FROM SalesLT.SalesOrderDetail
                   WHERE UnitPrice < 100.00 AND SalesLT.SalesOrderDetail.ProductID = SalesLT.Product.ProductID)
       AND ListPrice >= 100.00
     ORDER BY ProductID;
    

    Fourth, the first task has not been discussed. It will look somewhat similarly to their subselect:

     SELECT ProductID, Name, ListPrice
     FROM SalesLT.Product
     WHERE ListPrice > (SELECT AVG(UnitPrice)
                        FROM SalesLT.SalesOrderDetail
                        WHERE SalesLT.Product.ProductID = SalesLT.SalesOrderDetail.ProductID)
    ORDER BY ProductID;