I have a table with log records. Each log record is represented by a status (open
or closed
) and a date
:
CREATE TABLE logs (
id BIGSERIAL PRIMARY KEY,
status VARCHAR NOT NULL,
inserted_at DATE NOT NULL
);
I need to get a daily report with a following information:
status = open
were created,status = closed
were created,status = open
exist to this day, including this day.Here's a sample report output:
day | created | closed | total_open
------------+---------+--------+------------
2017-01-01 | 2 | 0 | 2
2017-01-02 | 2 | 1 | 3
2017-01-03 | 1 | 1 | 3
2017-01-04 | 1 | 0 | 4
2017-01-05 | 1 | 0 | 5
2017-01-06 | 1 | 0 | 6
2017-01-07 | 1 | 0 | 7
2017-01-08 | 0 | 1 | 6
2017-01-09 | 0 | 0 | 6
2017-01-10 | 0 | 0 | 6
(10 rows)
I solved this in a very "dirty" way:
INSERT INTO logs (status, inserted_at) VALUES
('created', '2017-01-01'),
('created', '2017-01-01'),
('closed', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-02'),
('created', '2017-01-03'),
('closed', '2017-01-03'),
('created', '2017-01-04'),
('created', '2017-01-05'),
('created', '2017-01-06'),
('created', '2017-01-07'),
('closed', '2017-01-08');
SELECT days.day,
count(case when logs.inserted_at = days.day AND logs.status = 'created' then 1 end) as created,
count(case when logs.inserted_at = days.day AND logs.status = 'closed' then 1 end) as closed,
count(case when logs.inserted_at <= days.day AND logs.status = 'created' then 1 end) -
count(case when logs.inserted_at <= days.day AND logs.status = 'closed' then 1 end) as total
FROM (SELECT day::date FROM generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) day) days,
logs
GROUP BY days.day
ORDER BY days.day;
Also (posted it on gist for brevity), and would like to improve the solution.
Right now explain
for my query reveals some ridiculous cost numbers that that I would like to minimize (I don't have indexes yet).
How an efficient query to achieve the report above would look like?
A possible solution is to use window functions:
select s.*, sum(created - closed) over (order by inserted_at)
from (select inserted_at,
count(status) filter (where status = 'created') created,
count(status) filter (where status = 'closed') closed
from (select d::date inserted_at
from generate_series('2017-01-01'::date, '2017-01-10'::date, '1 day'::interval) d) d
left join logs using (inserted_at)
group by inserted_at) s
http://rextester.com/GFRRP71067
Also, an index on (inserted_at, status)
could help you a lot with this query.
Note: count(...) filter (where ...)
is really just a fancy way to write count(case when ... then ... [else null] end)
.