Search code examples
mysqlsumquery-optimizationaverage

mysql query running too slow for discontinuous data


I am new to MySQL, and trying to using MySQL on the project, basically was tracking players performance. Below is the table fields.

+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field             | Type                 | Collation         | Null | Key | Default | Extra          | Privileges                      | Comment |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| unique_id         | int(11)              | NULL              | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| record_time       | datetime             | NULL              | NO   |     | NULL    |                | select,insert,update,references |         |
| game_sourceid     | char(20)             | latin1_swedish_ci | NO   | MUL | NULL    |                | select,insert,update,references |         |
| game_number       | smallint(6)          | NULL              | NO   |     | NULL    |                | select,insert,update,references |         |
| game_difficulty   | char(12)             | latin1_swedish_ci | NO   | MUL | NULL    |                | select,insert,update,references |         |
| cost_time         | smallint(5) unsigned | NULL              | NO   | MUL | NULL    |                | select,insert,update,references |         |
| country           | char(3)              | latin1_swedish_ci | NO   |     | NULL    |                | select,insert,update,references |         |
| source            | char(7)              | latin1_swedish_ci | NO   |     | NULL    |                | select,insert,update,references |         |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+

and I have adding game_sourceid and game_difficulty as index and the engine is innodb.

I have insert about 11m rows of test data into this table, which is generated randomly but resembles the real data.

Basically the mostly query was like this, to get the average time and best time for a specific game_sourceid

SELECT avg(cost_time) AS avgtime
    , min(cost_time) AS mintime
    , count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1';

+-----------+---------+--------+
| avgtime   | mintime | count  |
+-----------+---------+--------+
| 1681.2851 |     420 | 138034 |
+-----------+---------+--------+
1 row in set (4.97 sec)

and the query took about 5s

I have googled about this and someone said that may caused by the amout of query count, so I am trying to narrow down the scope like this

SELECT avg(cost_time) AS avgtime
    , min(cost_time) AS mintime
    , count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1'
    AND record_time > '2015-11-19 04:40:00';

+-----------+---------+-------+
| avgtime   | mintime | count |
+-----------+---------+-------+
| 1275.2222 |     214 |     9 |
+-----------+---------+-------+

1 row in set (4.46 sec)

As you can see the 9 rows data also took about 5s, so i think it's not the problem about the query count.

The test data was generated randomly to simulate the real user's activity, so the data was discontinuous, so i added more continuous data(about 250k) with the same game_sourceid='standard_easy_9' but keep all others randomly, in other words the last 250k rows in this table has the same game_sourceid. And i'm trying to query like this:

SELECT avg(cost_time) AS avgtime
    , min(cost_time) AS mintime
    , count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_9';

+-----------+---------+--------+
| avgtime   | mintime | count  |
+-----------+---------+--------+
| 1271.4806 |      70 | 259379 |
+-----------+---------+--------+
1 row in set (0.40 sec)

This time the query magically took only 0.4s, that's totally beyond my expectations.

So here's the question, the data was retrived from the player at real time, so it must be randomly and discontinuous.

I am thinking of separating the data into multiple tables by the game_sourceid, but it will take another 80 tables for that, maybe more in the future.

Since I am new to MySQL, I am wondering if there are any other solutions for this, or just my query was too bad.

Update: Here's the index of my table

mysql> show index from statistics_work_table;

+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                 | Non_unique | Key_name                | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| statistics_work_table |          0 | PRIMARY                 |            1 | unique_id       | A         |    11362113 |     NULL | NULL   |      | BTREE      |         |               |
| statistics_work_table |          1 | GameSourceId_CostTime   |            1 | game_sourceid   | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| statistics_work_table |          1 | GameSourceId_CostTime   |            2 | cost_time       | A         |      344306 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Solution

  • ALTER TABLE `statistics_work_table`
    ADD INDEX `GameSourceId_CostTime` (`game_sourceid`,`cost_time`)
    

    This index should make your queries super fast. Also, after you run the above statement, you should drop the single column index you have on game_sourceid, as the above will make the single column one redundant. (Which will hurt insert speed.)

    The reason your queries are slow is because the database is using your single column index on game_sourceid, finding the rows, and then, for each row, using the primary key that is stored along with the index to find the main clustered index (aka primary key in this, and most cases), and then looking up the cost_time value. This is referred to as a double lookup, and it is something you want to avoid.

    The index I provided above is called a "covering index". It allows your query to use ONLY the index, and so you only need a single lookup per row, greatly improving performance.