I have a table with the following structure:
name | version | processed | processing | updated | ref_time
------+---------+-----------+------------+----------+----------
abc | 1 | t | f | 27794395 | 27794160
def | 1 | t | f | 27794395 | 27793440
ghi | 1 | t | f | 27794395 | 27793440
jkl | 1 | f | f | 27794395 | 27794160
mno | 1 | t | f | 27794395 | 27793440
pqr | 1 | f | t | 27794395 | 27794160
I can use the following query to count the total number within each ref_time
:
SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time;
ref_time | total
----------+-------
27794160 | 2259
27793440 | 2259
And the following query to count the total number within each ref_time
where processed=true
:
SELECT ref_time, COUNT (*) AS processed FROM (SELECT * FROM status_table WHERE processed=true) AS _ GROUP BY ref_time;
ref_time | processed
----------+-----------
27794160 | 1057
27793440 | 2259
I then try to merge the information using an INNER JOIN
on ref_time
:
SELECT * FROM
(SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time) result_total
INNER JOIN
(SELECT ref_time, COUNT (*) AS processed
FROM (SELECT * FROM status_table WHERE processed=true) AS _
GROUP BY ref_time) result_processed
ON result_total.ref_time = result_processed.ref_time;
ref_time | total | ref_time | processed
----------+-------+----------+-----------
27794160 | 2259 | 27794160 | 1057
27793440 | 2259 | 27793440 | 2259
First question: how do I avoid the duplicated ref_time
column?
Second question: how do I add an additional percent
column derived as (100 * processed / total)
(to one d.p.), i.e. to give:
ref_time | total | processed | percent
----------+-------+-----------+---------
27794160 | 2259 | 1057 | 46.8
27793440 | 2259 | 2259 | 100.0
Third question: is there a more efficient way to do this? Can I avoid making two separate SELECT
queries?
Postgres has expressive aggregate functions.
To do the conditional count, we can use the standard filter
clause directly against column processed
, which is a boolean. As for the percentage (or ratio), we can cast the boolean to an integer (which yields 0
or 1
, as you would expect), and take the average of that.
So:
select ref_time,
count(*) cnt_total,
count(*) filter(where processed) cnt_processed,
avg(processed::int) ratio_processed
from mytable
group by ref_time
Here is a demo with your sample data, which returns:
ref_time | cnt_total | cnt_processed | ratio_processed |
---|---|---|---|
27794160 | 3 | 1 | 0.33333333333333333333 |
27793440 | 3 | 3 | 1.00000000000000000000 |