Item 2 of the challenge 2 of this Lab from Microsoft SQL Server Team, is as follows:
Challenge 1: retrieve product price information
AdventureWorks products each have a standard cost price that indicates the cost of manufacturing the product, and a list price
that indicates the recommended selling price for the product. This data is stored in the SalesLT.Product
table. Whenever a product is ordered, the actual unit price
at which it was sold is also recorded in the SalesLT.SalesOrderDetail
table. You must use subqueries to compare the cost and list prices for each product with the unit prices charged in each sale.
Retrieve products whose list price is higher than the average unit price.
Retrieve the product ID, name, and list price for each product where the list price is higher than the average unit price for all products that have been sold.
Retrieve Products with a list price of 100 or more that have been sold for less than 100.
Retrieve the product ID, name, and list price for each product where the list price is 100 or more, and the product has been sold for less than 100.
A solution (provided on the same link above) for the above challenge, they have presented with the following query that returns 7 records:
SELECT ProductID, Name, ListPrice
FROM SalesLT.Product
WHERE ProductID IN (SELECT ProductID
FROM SalesLT.SalesOrderDetail
WHERE UnitPrice < 100.00)
AND ListPrice >= 100.00
ORDER BY ProductID;
ProductID | Name | ListPrice |
---|---|---|
810 | HL Mountain Handlebars | 120.27 |
813 | HL Road Handlebars | 120.27 |
876 | Hitch Rack - 4-Bike | 120 |
894 | Rear Derailleur | 121.46 |
907 | Rear Brakes | 106.5 |
948 | Front Brakes | 106.5 |
996 | HL Bottom Bracket | 121.49 |
But, with the same search criteria, my own following query returns 27 records (shown below) where the list prices >=100
and UnitPrice < 100
:
Question: Why the solution suggested by Microsoft Lab is correct, and what I am doing is not correct:
SELECT p.ProductID, p.Name, p.ListPrice, s.UnitPrice
FROM SalesLT.Product p
JOIN SalesLT.SalesOrderDetail s ON p.ProductID = s.ProductID
WHERE p.ListPrice >= 100.00
AND s.UnitPrice < 100.00
ORDER BY ProductID;
ProductID | Name | ListPrice | UnitPrice |
---|---|---|---|
810 | HL Mountain Handlebars | 120.27 | 72.162 |
810 | HL Mountain Handlebars | 120.27 | 72.162 |
810 | HL Mountain Handlebars | 120.27 | 72.162 |
813 | HL Road Handlebars | 120.27 | 72.162 |
813 | HL Road Handlebars | 120.27 | 72.162 |
813 | HL Road Handlebars | 120.27 | 72.162 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
876 | Hitch Rack - 4-Bike | 120.00 | 72.00 |
894 | Rear Derailleur | 121.46 | 72.876 |
894 | Rear Derailleur | 121.46 | 72.876 |
907 | Rear Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
948 | Front Brakes | 106.50 | 63.90 |
996 | HL Bottom Bracket | 121.49 | 72.894 |
996 | HL Bottom Bracket | 121.49 | 72.894 |
996 | HL Bottom Bracket | 121.49 | 72.894 |
996 | HL Bottom Bracket | 121.49 | 72.894 |
First, let's be a little pedantic and consider the difference between what the task asked from you and what you did instead. The task asked you to use subselect, with the main select querying SalesLT.Product
and the subselect querying SalesLT.SalesOrderDetail
. Instead of doing as the task instructed you, you did a join
. You could technically achieve the goal of the task with a join
too, but you need to follow the instructions and, if you are being asked to use a subquery, then do so.
Second, the logical error that you committed is that when you join SalesLt.Product
with SalesLT.SalesOrderDetail
, then you join all matches, so if you have a product that was sold 8 times, then you will get 8 pairs and consequently you will have 8 records corresponding to that product instead of 1 (see ProductID
876 for example), which would have been the result if you followed the instruction and would have used a subquery. But, to achieve the same with a join
, you can do a group by
, that is, you merge your multiple product-sales tuples into singular records, exactly one record for each product that has a single match:
SELECT p.ProductID, p.Name, p.ListPrice
FROM SalesLT.Product p
JOIN SalesLT.SalesOrderDetail s ON p.ProductID = s.ProductID
WHERE p.ListPrice >= 100.00
AND s.UnitPrice < 100.00
GROUP BY p.ProductID, p.Name, p.ListPrice
ORDER BY ProductID;
Third, your join
and their subselect is not optimal, as they will search for all pairs instead of a single example. EXISTS
search for a single match in the subselect and, if found a match, stops searching for a new one, whereas IN
finds ALL matches and therefore takes more time and in our case we are only interested in EXISTS
, so:
SELECT ProductID, Name, ListPrice
FROM SalesLT.Product
WHERE EXISTS (SELECT SalesLT.SalesOrderDetail.ProductID
FROM SalesLT.SalesOrderDetail
WHERE UnitPrice < 100.00 AND SalesLT.SalesOrderDetail.ProductID = SalesLT.Product.ProductID)
AND ListPrice >= 100.00
ORDER BY ProductID;
Fourth, the first task has not been discussed. It will look somewhat similarly to their subselect:
SELECT ProductID, Name, ListPrice
FROM SalesLT.Product
WHERE ListPrice > (SELECT AVG(UnitPrice)
FROM SalesLT.SalesOrderDetail
WHERE SalesLT.Product.ProductID = SalesLT.SalesOrderDetail.ProductID)
ORDER BY ProductID;