Search code examples
mysqlsqldatedatetimehour

How can I select hourly counts from a table, including missing hours?


I'm looking to gather counts by hour. But not every hour is represented in my table.

To make sure the data always includes empty hours, I built an hour table that has hours in datetime from 2000-2037. I figure I can LEFT JOIN data tables to this table to keep track of missing hours. But I need help doing so.

Table: date_hour:

`hour`
2000-01-01 00:00:00
2000-01-01 01:00:00
...
2036-12-31 23:00:00

Table my_data:

log_date               field1
2015-05-01 00:31:00    1000
2015-05-01 04:19:00    2000    
2015-05-01 05:19:00    1000
2015-05-01 07:19:00    100
2015-05-01 07:35:00    6000

Desired result:

hour                   count
2015-05-01 00:00:00    1
2015-05-01 01:00:00    0
2015-05-01 02:00:00    0
2015-05-01 03:00:00    0
2015-05-01 04:00:00    1
2015-05-01 05:00:00    1
2015-05-01 06:00:00    0
2015-05-01 07:00:00    2

MySQL attempt:

SELECT
    dh.hour,
    COUNT(md.*) AS count
FROM
    date_hour dh
    LEFT JOIN my_data md ON dh.hour = ????md.log_date????
WHERE
        dh.hour >= '2015-05-01'
    AND dh.hour <  '2015-05-02'
GROUP BY
    dh.hour
ORDER BY
    dh.hour;

What's the most efficient way to accomplish these counts? Assume each day has 100k-1MM records, with the goal of measuring at least 30 days of data at a time.


Solution

  • Can use DATE_FORMAT to strip off the minutes and seconds like :

    query

    SELECT
        dh.hour,
        COUNT(md.*) AS count
    FROM
        date_hour dh LEFT JOIN my_data md 
        ON dh.hour = DATE_FORMAT(md.log_date, "%Y-%m-%d %H:00:00")
    WHERE
            dh.hour >= '2015-05-01'
        AND dh.hour <  '2015-05-02'
    GROUP BY
        dh.hour
    ORDER BY
        dh.hour
    ;
    

    output

    +------------------------+-----------+
    |          hour          |   count   |
    +------------------------+-----------+
    | 2015-05-01 00:00:00    | 1         |
    | 2015-05-01 01:00:00    | 0         |
    | 2015-05-01 02:00:00    | 0         |
    | 2015-05-01 03:00:00    | 0         |
    | 2015-05-01 04:00:00    | 1         |
    | 2015-05-01 05:00:00    | 1         |
    | 2015-05-01 06:00:00    | 0         |
    | 2015-05-01 07:00:00    | 2         |
    | ... trailing hours ... | allzeroes |
    +------------------------+-----------+
    

    where everything after 2015-05-01 08:00:00 is zeroes ( no data in my_data )

    sqlfiddle