I've got a remedial question about pulling results out of a CTE in a later part of the query. For the example code, below are the relevant, stripped down tables:
CREATE TABLE print_job (
created_dts timestamp not null default now(),
status text not null
);
CREATE TABLE calendar_day (
date_actual date not null
);
In the current setup, there are gaps in the dates in the print_job
data, and we would like to have a gapless result. For example, there are 87 days from the first to last date in the table, and only 77 days in there have data. We've already got a calendar_day dimension table to join with to get the 87 rows for the 87-day range. It's easy enough to figure out the min
and max
dates in the data with a subquery or in a CTE, but I don't know how to use those values from a CTE. I've got a full query below, but here are the relevant fragments with comments:
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
select actual_date
from calendar_day
where actual_date >= date_range.start_date
and actual_date <= date_range.end_date
),
-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
I run into this regularly, and finally figured I'd ask. I've hunted around already for an answer, but I'm not clear how to summarize it well. And while there's nothing wrong with the subqueries in this case, I've got other situations where a CTE is nicer/more readable.
If it helps, I've listed the complete query below.
-- Get some counts and give them names.
WITH
daily_status AS (
select created_dts::date as created_date,
count(*) AS daily_total,
count(*) FILTER (where status = 'Error') AS status_error,
count(*) FILTER (where status = 'Processing') AS status_processing,
count(*) FILTER (where status = 'Aborted') AS status_aborted,
count(*) FILTER (where status = 'Done') AS status_done
from print_job
group by created_dts::date
),
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- There are gaps in the data, and we want a row for dates with no results.
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date.
-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
-- select actual_date
--
-- from calendar_day
--
-- where actual_date >= date_range.start_date
-- and actual_date <= date_range.end_date
-- ),
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
coalesce(daily_total,0) AS total,
coalesce(status_error,0) AS errors,
coalesce(status_processing,0) AS processing,
coalesce(status_aborted,0) AS aborted,
coalesce(status_done,0) AS done
from complete_date_series_using_subquery
left join daily_status
on daily_status.created_date =
complete_date_series_using_subquery.date_actual
order by date_actual
I said it was a remedial question....I remembered where I'd seen this done before:
https://tapoueh.org/manual-post/2014/02/postgresql-histogram/
In my example, I need to list the CTE in the table list. That's obvious in retrospect, and I realize that I automatically don't think to do that as I'm habitually avoiding CROSS JOIN
. The fragment below shows the slight change needed:
WITH
date_range AS (
select min(created_dts)::date as start_date,
max(created_dts)::date as end_date
from print_job
),
complete_date_series AS (
select date_actual
from calendar_day, date_range
where date_actual >= date_range.start_date
and date_actual <= date_range.end_date
),