Search code examples
postgresqlsubquerycommon-table-expression

Using min/max values from a CTE in a later query, instead of using a subquery in Postgres


I've got a remedial question about pulling results out of a CTE in a later part of the query. For the example code, below are the relevant, stripped down tables:

CREATE TABLE print_job (
   created_dts  timestamp  not null default now(),
   status       text       not null
);

CREATE TABLE calendar_day (
   date_actual date not null
);

In the current setup, there are gaps in the dates in the print_job data, and we would like to have a gapless result. For example, there are 87 days from the first to last date in the table, and only 77 days in there have data. We've already got a calendar_day dimension table to join with to get the 87 rows for the 87-day range. It's easy enough to figure out the min and max dates in the data with a subquery or in a CTE, but I don't know how to use those values from a CTE. I've got a full query below, but here are the relevant fragments with comments:

-- Get the date range from the data.
date_range AS (
    select min(created_dts::date) AS start_date,
           max(created_dts::date) AS end_date
           
   from print_job),
   
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
     select actual_date
     
       from calendar_day

   where actual_date >= date_range.start_date  
     and actual_date <= date_range.end_date
),

-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (

   select date_actual
     
     from calendar_day

    where date_actual >= (select min(created_dts::date) from print_job)
      and date_actual <= (select max(created_dts::date) from print_job)   
)

I run into this regularly, and finally figured I'd ask. I've hunted around already for an answer, but I'm not clear how to summarize it well. And while there's nothing wrong with the subqueries in this case, I've got other situations where a CTE is nicer/more readable.

If it helps, I've listed the complete query below.

-- Get some counts and give them names.
WITH 
daily_status AS (
    select created_dts::date as created_date,
           count(*)                                        AS daily_total,
           count(*) FILTER (where status = 'Error')        AS status_error,
           count(*) FILTER (where status = 'Processing')   AS status_processing,
           count(*) FILTER (where status = 'Aborted')      AS status_aborted,
           count(*) FILTER (where status = 'Done')         AS status_done

     from print_job

     group by created_dts::date
),

-- Get the date range from the data.
date_range AS (
    select min(created_dts::date) AS start_date,
           max(created_dts::date) AS end_date
           
   from print_job),
   
-- There are gaps in the data, and we want a row for dates with no results.  
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date. 

-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
--      select actual_date
--      
--        from calendar_day
-- 
--    where actual_date >= date_range.start_date  
--      and actual_date <= date_range.end_date
-- ),

complete_date_series_using_subquery AS (

   select date_actual
     
     from calendar_day

    where date_actual >= (select min(created_dts::date) from print_job)
      and date_actual <= (select max(created_dts::date) from print_job)
     
)

-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
       coalesce(daily_total,0)        AS total, 
       coalesce(status_error,0)       AS errors,
       coalesce(status_processing,0)  AS processing,
       coalesce(status_aborted,0)     AS aborted,
       coalesce(status_done,0)        AS done
       
   from complete_date_series_using_subquery
left join daily_status
       on daily_status.created_date  = 
          complete_date_series_using_subquery.date_actual

    order by date_actual

Solution

  • I said it was a remedial question....I remembered where I'd seen this done before:

    https://tapoueh.org/manual-post/2014/02/postgresql-histogram/

    In my example, I need to list the CTE in the table list. That's obvious in retrospect, and I realize that I automatically don't think to do that as I'm habitually avoiding CROSS JOIN. The fragment below shows the slight change needed:

    WITH
    date_range AS ( 
    
    select min(created_dts)::date as start_date,
           max(created_dts)::date as end_date
           
      from print_job
    ),        
    
    complete_date_series AS ( 
    
    select date_actual 
      from calendar_day, date_range
     where date_actual >= date_range.start_date   
       and date_actual <= date_range.end_date
    ),