Search code examples
mysqlindexinggroup-bydenormalization

Add index to generated column


First, sorry if the used terms are not right. I'm not a mySQL-professional.

I have a table like this :

CREATE TABLE `accesses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `time` int(11) DEFAULT NULL,
  `accessed_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_accesses_on_accessed_at` (`accessed_at`)
) ENGINE=InnoDB AUTO_INCREMENT=9278483 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

This table has 10.000.000 rows in it. I use it to generate charts, with queries like this :

SELECT SUM(time) AS value, DATE(created_at) AS date
FROM `accesses`
GROUP BY date;

This query is very long (more than 1 minute). I'm doing lots of others queries (with AVG, MIN or MAX instead of SUM, or with a WHERE on a specific day or month, or GROUP BY HOUR(created_at), etc...)

I want to optimize it. The best idea I have is to add several columns, with redundancy, like DATE(created_at), HOUR(created_at), MONTH(created_at), then add an index on it.

... Is this solution good or is there any other one ?

Regards


Solution

  • Yes, it can be an optimization to store data redundantly in permanent columns with an index to optimize certain queries. This is one example of denormalization.

    Depending on the amount of data and the frequency of queries, this can be an important speedup (@Marshall Tigerus downplays it too much, IMHO).

    I tested this out by running EXPLAIN:

    mysql> explain SELECT SUM(time) AS value, DATE(created_at) AS date FROM `accesses` GROUP BY date\G                      *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: accesses
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 1
         filtered: 100.00
            Extra: Using temporary; Using filesort
    

    Ignore the fact that the table is empty in my test. The important part is Using temporary; Using filesort which are expensive operations, especially if your temp table gets so large that MySQL can't fit it in memory.

    I added some columns and indexes on them:

    mysql> alter table accesses add column cdate date, add key (cdate), 
      add column chour tinyint, add key (chour), 
      add column cmonth tinyint, add key (cmonth);
    
    mysql> explain SELECT SUM(time) AS value, cdate FROM `accesses` GROUP BY cdate\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: accesses
       partitions: NULL
             type: index
    possible_keys: cdate
              key: cdate
          key_len: 4
              ref: NULL
             rows: 1
         filtered: 100.00
            Extra: NULL
    

    The temporary table and filesort went away, because MySQL knows it can do an index scan to process the rows in the correct order.