Search code examples
postgresqlcountgroup-by

Grouping by date, with 0 when count() yields no lines


I'm using Postgresql 9 and I'm fighting with counting and grouping when no lines are counted.

Let's assume the following schema :

create table views {
 date_event timestamp with time zone ;
 event_id integer;
}

Let's imagine the following content :

2012-01-01 00:00:05    2
2012-01-01 01:00:05    5
2012-01-01 03:00:05    8
2012-01-01 03:00:15    20

I want to group by hour, and count the number of lines. I wish I could retrieve the following :

2012-01-01 00:00:00    1
2012-01-01 01:00:00    1
2012-01-01 02:00:00    0
2012-01-01 03:00:00    2
2012-01-01 04:00:00    0
2012-01-01 05:00:00    0
.
.

2012-01-07 23:00:00    0

I mean that for each time range slot, I count the number of lines in my table whose date correspond, otherwise, I return a line with a count at zero.

The following will definitely not work (will yeld only lines with counted lines > 0).

SELECT  extract ( hour from date_event ),count(*)
FROM views
where date_event > '2012-01-01' and date_event <'2012-01-07'
GROUP BY extract ( hour from date_event );

Please note I might also need to group by minute, or by hour, or by day, or by month, or by year (multiple queries is possible of course).

I can only use plain old sql, and since my views table can be very big (>100M records), I try to keep performance in mind.

How can this be achieved ?

Thank you !


Solution

  • Given that you don't have the dates in the table, you need a way to generate them. You can use the generate_series function:

    SELECT * FROM generate_series('2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts;
    

    This will produce results like this:

             ts          
    ---------------------
     2012-01-01 00:00:00
     2012-01-01 01:00:00
     2012-01-01 02:00:00
     2012-01-01 03:00:00
    ...
     2012-01-07 21:00:00
     2012-01-07 22:00:00
     2012-01-07 23:00:00
    (168 rows)
    

    The remaining task is to join the two selects using an outer join like this :

    select extract ( day from ts ) as day, extract ( hour from ts ) as hour,coalesce(count,0) as count from 
    (
        SELECT  extract ( day from date ) as day , extract ( hour from date ) as hr ,count(*)
        FROM    sr
        where date>'2012-01-01' and date <'2012-01-07'
        GROUP BY   extract ( day from date ) , extract ( hour from date )
    ) AS cnt 
     right outer join ( SELECT * FROM generate_series ( '2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts ) as dtetable on extract ( hour from ts ) = cnt.hr and extract ( day from ts ) = cnt.day 
     order by day,hour asc;