Search code examples
mysqlsql-optimization

Is it possible to optimize these queries any more?


Schema is like this visits_table:

+---------------------------+----------------------+------+-----+---------+----------------+
| Field                     | Type                 | Null | Key | Default | Extra          |
+---------------------------+----------------------+------+-----+---------+----------------+
| idvisit                   | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| idsite                    | int(10) unsigned     | NO   | MUL | NULL    |                |
| idvisitor                 | binary(8)            | NO   |     | NULL    |                |
| visit_time                | datetime             | NO   |     | NULL    |                |
| user_id                   | varchar(200)         | YES  |     | NULL    |                |
| config_cookie             | tinyint(1)           | NO   |     | NULL    |                |
| custom_var_k1             | varchar(200)         | YES  |     | NULL    |                |
| custom_var_v1             | varchar(200)         | YES  |     | NULL    |                |
+---------------------------+----------------------+------+-----+---------+----------------+

Indexes:

+----------------------+------------+------------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                | Non_unique | Key_name                     | Seq_in_index | Column_name            | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+------------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_table         |          0 | PRIMARY                      |            1 | idvisit                | A         |        1502 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_datetime        |            1 | idsite                 | A         |           5 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_datetime        |            2 | visit_time             | A         |        1502 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_idvisitor       |            1 | idsite                 | A         |           1 | NULL     | NULL   |      | BTREE      |         |               |
| visits_table         |          1 | index_idsite_idvisitor       |            2 | idvisitor              | A         |         500 | NULL     | NULL   |      | BTREE      |         |               |
+----------------------+------------+------------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

and I've prepared two queries:

SELECT 
    COUNT(`idvisit`) AS `visits_count`, 
    DATE(`visit_time`) AS `date` 
FROM (
    SELECT * 
    FROM 
        `visits_table` 
    WHERE 
        `idsite` = 2 
        AND `visit_time` >= '2015-04-01 00:00:00' 
        AND `visit_time` <= '2015-04-30 23:59:59'
) AS `visits`
WHERE 1 
GROUP BY 
    DATE(`visit_time`); 



+----+-------------+----------------------+------+----------------------------------------------+------+---------+------+------+---------------------------------+
| id | select_type | table                | type | possible_keys                                | key  | key_len | ref  | rows | Extra                           |
+----+-------------+----------------------+------+----------------------------------------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>           | ALL  | NULL                                         | NULL | NULL    | NULL | 1469 | Using temporary; Using filesort |
|  2 | DERIVED     | visits_table         | ALL  | index_idsite_datetime,index_idsite_idvisitor | NULL | NULL    | NULL | 1502 | Using where                     |
+----+-------------+----------------------+------+----------------------------------------------+------+---------+------+------+---------------------------------+

in MySQL 5.6 in row 2 type = ref, key = index_idsite_datetime, key_len = 4, ref = const, Extra = Using index

    SELECT 
        COUNT(`idvisit`) AS `visits_count`, 
        DATE(`visit_time`) AS `date` 
    FROM 
        `visits_table` 
    WHERE 
        `idsite` = 2 
        AND `visit_time` >= '2015-04-01 00:00:00' 
        AND `visit_time` <= '2015-04-30 23:59:59'
    GROUP BY 
        DATE(`visit_time`);

+----+-------------+----------------------+-------+----------------------------------------------+-----------------------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table                | type  | possible_keys                                | key                   | key_len | ref  | rows | Extra                                                     |
+----+-------------+----------------------+-------+----------------------------------------------+-----------------------+---------+------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | visits_table         | range | index_idsite_datetime,index_idsite_idvisitor | index_idsite_datetime | 12      | NULL | 1468 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------------------+-------+----------------------------------------------+-----------------------+---------+------+------+-----------------------------------------------------------+

I have table with 86M rows and both queries take about 2 hours to execute. Is there anything I can do to speed up these queries?


Solution

  • I would suggest writing the query as:

    SELECT COUNT(*) AS `visits_count`, 
           DATE(`visit_time`) AS `date` 
    FROM  `visits_table` 
    WHERE  `idsite` = 2 AND
           `visit_time` >= '2015-04-01' AND
           `visit_time` < '2015-05-01'
    GROUP BY DATE(`visit_time`);
    

    This might save an iota of time, because the index is now a covering index.

    One way I can think to improve the query is to get rid of the group by. Try a query like this:

    select dte,
           (select count(*)
            from visits_table
            where idsite = 2 and
                  visit_time >= dates.dte AND visit_time < dates.dte + interval 1 day
    from (select date('2015-04-01') as dte union all
          select date('2015-04-02') as dte
         ) dates;
    

    MySQL is much better about using indexes for correlated subqueries than it is about using indexes for aggregation. The downside to this approach is that time will increase linearly with the number of days in the result set.