sqlpostgresql

LEFT join has missing dates/gaps when working with generate_series()


I'm trying to create a summary report of sales per year with a couple of tables like the following:

products
id (PK)
[... ]
Orders
id (PK)
date
product_id (FK to products

I've written a query where I select from a time series built from the oldest to the most recent orders and then left joined that with the two tables, grouping by product_id and year timestamp:

SELECT 
    date_trunc('year'::text, years) AS year,
    products.id as product_id,
    coalesce(count(o.id),0) AS total_orders
   FROM generate_series(
       (select MIN(orders.date) from orders),
       (select MAX(orders.date) from orders),
       '1 year'::interval
   ) years
     left JOIN orders o ON date_trunc('year'::text, years) = date_trunc('year'::text, o.date)
     join products on products.id = o.product_id
     
  GROUP BY year, products.id
  ORDER BY year, products.id

The result of the query looks like this:

year product_id total_orders
2012-01-01 1 4
2013-01-01 1 3
2013-01-01 2 1

Due to the left join, I was expecting to see another row with coalescing the null results to zero:

year product_id total_orders
2012-01-01 2 0

Any ideas as to why the left join on the time series is not being returned as a row pointing at zero total_orders when no rows exist in orders for the given year?


Solution

  • If you want to get all years since the earliest one, and for each of them, all possible products with total_orders for each. Demo at db<>fiddle:

    SELECT 
     years.year  AS year,
     products.id AS product_id,
     count(o.id) AS total_orders
    FROM generate_series(
        (select date_trunc('year', MIN(orders.date)) from orders),
        (select MAX(orders.date) from orders),
        '1 year') AS years(year)
      JOIN (select distinct id from products) AS products ON true
      LEFT JOIN orders o ON years.year = date_trunc('year',o.date)
                         AND products.id = o.product_id
    GROUP BY year, products.id
    ORDER BY year, products.id
    
    1. There's no need to coalesce(count(o.id),0).
    2. If you date_trunc() in generate_series(), you don't need to repeat it elsewhere.

    I think you might be reading your joins wrong. For your expected output to appear, this:

    years LEFT JOIN orders ... INNER JOIN products ...
    

    Would have to be evaluated like this:

    years LEFT JOIN (orders ... INNER JOIN products ...)
    

    Meanwhile, it's evaluated sequentially, left-to-right, like so:

    (years LEFT JOIN orders ...) INNER JOIN products ...
    

    And it's outlined in the SELECT doc along with a hint on how you can force alternative behaviour:

    Use parentheses if necessary to determine the order of nesting. In the absence of parentheses, JOINs nest left-to-right.