I am new to SQL and learning joins.
Please consider the two queries below:
When I run the first query below:
select count(*)
from products p
left outer join order_details_v od
on p.product_id = od.order_item_product_id
Returned count value is 12332
But when I am this second query:
select count(*)
from products p
left outer join order_details_v od
on p.product_id = od.order_item_product_id
and to_char(od.order_date::timestamp, 'yyyy-MM') = '2014-01'
Returned count value is 1224
Why is the count is different in the second query? I understand that we have added an additional condition in the join, but since it is a left outer join, it should keep all the rows of the left table and therefore count should be same. Where am I wrong?
If you change the query from Count(*) to * you can easily inspect what is occurring.
Using a LEFT OUTER JOIN with the additional filter on the order date will always return at least one row for each of the products. Additionally, each product that had orders on 2014-01 will likely have multiple lines for an order. That will expand the count of rows needed to report the product.
As you learn more about SQL, explore more SQL commands to examine results. If you add Select COUNT (DISTINCT p.product_id) to your query you'll notice that the count remains the same.
Your SQL returns the COUNT(products where the product order date is null) + SUM(product detail counts where the order day is 2014-01)
One additional point beginners don't realize is if you add the restriction to a where clause instead of the on clause such as Select ... from ... Where order day is '2014-01' your result will behave like an inner join and only products that have been ordered on the day will be returned. The data with null values in the order date of the right side of the join were null after the outer join, but in the where clause null does not equal order day is '2014-01. So all of those records are removed instead of reported.
Good luck as you learn the language.