Search code examples
mysqldatetimegroup-byhour

Displaying aggregate data per hour


I have a MySQL database with a table containing timestamped logins and I would like to get the number of logins for each hour of the last 24 hours.

At first, I tried something obvious:

SELECT
  timestamp AS Hour,
  COUNT(*) AS Logins
FROM auth
WHERE
  timestamp >= DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY HOUR(timestamp)
ORDER BY timestamp DESC

This results in output similar to the following:

+---------------------+--------+
| Hour                | Logins |
+---------------------+--------+
| 2017-10-08 17:00:05 |    272 |
| 2017-10-08 16:00:02 |    323 |
| 2017-10-08 15:00:34 |    301 |
| 2017-10-08 14:55:10 |     30 |
| 2017-10-08 11:04:27 |    107 |
| 2017-10-08 10:06:26 |    115 |
| 2017-10-08 09:00:11 |     92 |
| 2017-10-08 08:02:34 |    195 |
| 2017-10-08 07:03:15 |    171 |
| 2017-10-08 06:03:06 |    133 |
| 2017-10-08 05:00:20 |    102 |
| 2017-10-08 04:03:23 |    198 |
| 2017-10-08 03:00:23 |    345 |
| 2017-10-08 02:01:39 |    318 |
| 2017-10-08 01:01:22 |    205 |
| 2017-10-08 00:00:24 |    334 |
| 2017-10-07 23:00:00 |    501 |
| 2017-10-07 22:00:10 |    377 |
| 2017-10-07 21:00:02 |    482 |
| 2017-10-07 20:00:04 |    349 |
| 2017-10-07 19:00:54 |    298 |
| 2017-10-07 18:13:06 |    438 |
+---------------------+--------+
22 rows in set (0,02 sec)

There are two problems with this output. The first one is that the timestamps are not at the whole hour, because the first login occurred a few minutes/seconds after the exact hour. Also, I don't really need the date in the output. I have fixed this problem by doing the following:

SELECT
  DATE_FORMAT(DATE_ADD(timestamp, INTERVAL 30 MINUTE),'%H:00:00') AS Hour,
  COUNT(*) AS Logins
FROM auth
WHERE
  timestamp >= DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY HOUR(timestamp)
ORDER BY timestamp DESC

and the output now is

+----------+--------+
| Hour     | Logins |
+----------+--------+
| 17:00:00 |    272 |
| 16:00:00 |    323 |
| 15:00:00 |    301 |
| 15:00:00 |     30 |
| 11:00:00 |    107 |
| 10:00:00 |    115 |
| 09:00:00 |     92 |
| 08:00:00 |    195 |
| 07:00:00 |    171 |
| 06:00:00 |    133 |
| 05:00:00 |    102 |
| 04:00:00 |    198 |
| 03:00:00 |    345 |
| 02:00:00 |    318 |
| 01:00:00 |    205 |
| 00:00:00 |    334 |
| 23:00:00 |    501 |
| 22:00:00 |    377 |
| 21:00:00 |    482 |
| 20:00:00 |    349 |
| 19:00:00 |    298 |
| 18:00:00 |    452 |
+----------+--------+
22 rows in set (0,00 sec)

Question #1: Is this a good way of doing this (displaying the whole hour) or is there a better one?

The second problem I don't know how to solve. You see, the above command was executed at 18:19 local time. Note that the number of logins between 18:00 and 18:19 is missing. Yes, I understand that the last whole hour has not ended yet, but still, I would like to display the data that has been accumulated during the part of it that has elapsed.

Question #2: How to do this?


Solution

  • As @kmoser pointed out, the problem comes from the fact that the data for the last (incomplete) hour of the 24-hour period is grouped with the first hour of it - because the hour number is the same. In order to solve this, we need to group by something that differentiates between the same hours of different days. Here is the solution that worked for me:

    SELECT
      DATE_FORMAT(timestamp, '%Y-%m-%d %H:00:00') AS Hour,
      COUNT(*) AS Logins
    FROM auth
    WHERE
      timestamp >= DATE_SUB(NOW(), INTERVAL 1 DAY)
    GROUP BY DATE_FORMAT(timestamp, '%Y-%m-%d %H:00:00')
    ORDER BY timestamp DESC