The problem looks like this:
Write an SQL query that selects the product id, year, quantity, and price for the first year of every product sold.
Return the resulting table in any order.
A simple example of the question:
Input:
Sales table:
+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+
| 1 | 100 | 2008 | 10 | 5000 |
| 2 | 100 | 2009 | 12 | 5000 |
| 7 | 200 | 2011 | 15 | 9000 |
+---------+------------+------+----------+-------+
Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 100 | Nokia |
| 200 | Apple |
| 300 | Samsung |
+------------+--------------+
Output:
+------------+------------+----------+-------+
| product_id | first_year | quantity | price |
+------------+------------+----------+-------+
| 100 | 2008 | 10 | 5000 |
| 200 | 2011 | 15 | 9000 |
+------------+------------+----------+-------+
My code:
SELECT product_id, year AS first_year, quantity, price
FROM Sales
WHERE year IN (
SELECT MIN(year) as year
FROM Sales
GROUP BY product_id) ;
My code worked with the above simple example but failed on a longer test case.
The expected query:
SELECT product_id, year AS first_year, quantity, price
FROM Sales
WHERE (product_id, year) IN (
SELECT product_id, MIN(year) as year
FROM Sales
GROUP BY product_id) ;
So I don't understand why I have to put productid when filtering with where. Doesn't SQL automatically choose the corresponding product_id with the first year? Any hints would be greatly appreciated!
Here's a step-by-step of why the first query doesn't work. Note that I've omitted some fields that are unused for the sake of brevity.
Imagine your sales data contained the following data:
product_id | year |
---|---|
100 | 2008 |
100 | 2011 |
100 | 2011 |
100 | 2011 |
100 | 2011 |
200 | 2011 |
Based on this data, the inner subquery of your first query:
SELECT MIN(year) as year
FROM Sales
GROUP BY product_id
will produce a result as follows:
MIN(year) |
---|
2008 |
2011 |
And so then your query is effectively doing the following:
SELECT product_id, year AS first_year
FROM Sales
WHERE year IN (2008, 2011)
So this query is going to find all the sales that occurred in 2008 and all the sales that occurred in 2011. It is not going to filter by product_id as that is not specified in the WHERE statement. So it'll yield results as follows which is not what you want:
product_id | first_year |
---|---|
100 | 2008 |
100 | 2011 |
100 | 2011 |
100 | 2011 |
100 | 2011 |
200 | 2011 |
This is why you need to specify the product_id in your IN
statement.
On a general note, when debugging SQL, evaluate the inner-most queries first and then work outward as I have done in this answer.