Search code examples
sqlmysqldatetime-generation

MySQL Count data for last 7 days


I have the following schema.

Table votes

+------------------+--------------+------+-----+---------------------+----------------+
| Field            | Type         | Null | Key | Default             | Extra          |
+------------------+--------------+------+-----+---------------------+----------------+
| id               | int(10)      | NO   | PRI | NULL                | auto_increment |
| aid              | varchar(10)  | NO   |     |                     |                |
| ip               | varchar(100) | NO   |     |                     |                |
| host             | varchar(200) | NO   |     |                     |                |
| timestamp        | varchar(20)  | NO   |     | 0000-00-00 00:00:00 |                |
| user             | tinytext     | NO   |     | NULL                |                |
| userid           | int(10)      | NO   |     | 0                   |                |
+------------------+--------------+------+-----+---------------------+----------------+

Here I want to get the count of each aid on a day for the last 7 days with "0"s for the dates where there a no votes for aid. timestamp is unix timestamp here.

Any help is highly appreciated.


Solution

  • MySQL doesn't have recursive functionality, so you're left with using the NUMBERS table trick -

    1. Create a table that only holds incrementing numbers - easy to do using an auto_increment:

      DROP TABLE IF EXISTS `example`.`numbers`;
      CREATE TABLE  `example`.`numbers` (
        `id` int(10) unsigned NOT NULL auto_increment,
         PRIMARY KEY  (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      
    2. Populate the table using:

      INSERT INTO NUMBERS
        (id)
      VALUES
        (NULL)
      

      ...for as many values as you need.

    3. Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value. Replace "2010-01-01" and "2010-01-02" with your respective start and end dates (but use the same format, YYYY-MM-DD HH:MM:SS). In this example, I subtracted the NUMBERS.id value from the CURRENT_DATE to get a list of sequential date values for the last week -

      SELECT x.dt
        FROM (SELECT DATE_SUB(CURRENT_DATE, INTERVAL (n.id - 1) DAY) AS dt
                FROM numbers n
               WHERE n.id <= 7 ) x
      
    4. LEFT JOIN onto your table of data based on the datetime portion.

         SELECT x.dt,
                 COUNT(v.aid) AS num
           FROM (SELECT DATE_SUB(CURRENT_DATE, INTERVAL (n.id - 1) DAY) AS dt
                   FROM numbers n
                  WHERE n.id <= 7 ) x
      LEFT JOIN VOTES v ON DATE(FROM_UNIXTIME(v.timestamp)) = DATE(x.dt)
       GROUP BY x.dt
       ORDER BY x.dt
      

    Why Numbers, not Dates?

    Simple - dates can be generated based on the number, like in the example I provided. It also means using a single table, vs say one per data type.

    Previously:

      SELECT DATE(FROM_UNIXTIME(v.timestamp)) AS dt,
             COUNT(v.aid)
        FROM VOTES v
       WHERE DATE(FROM_UNIXTIME(v.timestamp)) BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
                                                  AND CURRENT_DATE
    GROUP BY DATE(FROM_UNIXTIME(v.timestamp))