I'm looking to gather counts by hour. But not every hour is represented in my table.
To make sure the data always includes empty hours, I built an hour table that has hours in datetime from 2000-2037. I figure I can LEFT JOIN
data tables to this table to keep track of missing hours. But I need help doing so.
Table: date_hour
:
`hour`
2000-01-01 00:00:00
2000-01-01 01:00:00
...
2036-12-31 23:00:00
Table my_data
:
log_date field1
2015-05-01 00:31:00 1000
2015-05-01 04:19:00 2000
2015-05-01 05:19:00 1000
2015-05-01 07:19:00 100
2015-05-01 07:35:00 6000
Desired result:
hour count
2015-05-01 00:00:00 1
2015-05-01 01:00:00 0
2015-05-01 02:00:00 0
2015-05-01 03:00:00 0
2015-05-01 04:00:00 1
2015-05-01 05:00:00 1
2015-05-01 06:00:00 0
2015-05-01 07:00:00 2
MySQL attempt:
SELECT
dh.hour,
COUNT(md.*) AS count
FROM
date_hour dh
LEFT JOIN my_data md ON dh.hour = ????md.log_date????
WHERE
dh.hour >= '2015-05-01'
AND dh.hour < '2015-05-02'
GROUP BY
dh.hour
ORDER BY
dh.hour;
What's the most efficient way to accomplish these counts? Assume each day has 100k-1MM records, with the goal of measuring at least 30 days of data at a time.
Can use DATE_FORMAT
to strip off the minutes and seconds like :
query
SELECT
dh.hour,
COUNT(md.*) AS count
FROM
date_hour dh LEFT JOIN my_data md
ON dh.hour = DATE_FORMAT(md.log_date, "%Y-%m-%d %H:00:00")
WHERE
dh.hour >= '2015-05-01'
AND dh.hour < '2015-05-02'
GROUP BY
dh.hour
ORDER BY
dh.hour
;
output
+------------------------+-----------+
| hour | count |
+------------------------+-----------+
| 2015-05-01 00:00:00 | 1 |
| 2015-05-01 01:00:00 | 0 |
| 2015-05-01 02:00:00 | 0 |
| 2015-05-01 03:00:00 | 0 |
| 2015-05-01 04:00:00 | 1 |
| 2015-05-01 05:00:00 | 1 |
| 2015-05-01 06:00:00 | 0 |
| 2015-05-01 07:00:00 | 2 |
| ... trailing hours ... | allzeroes |
+------------------------+-----------+
where everything after 2015-05-01 08:00:00 is zeroes ( no data in my_data )