Search code examples
mysqloptimizationquery-optimization

Any way to optimize this MySQL query? (Resource intense)


My app needs to run this query pretty often, which gets a list of user data for the app to display. The problem is that subquery about the user_quiz is resource heavy and calculating the rankings are also very CPU intense too. Benchmark: ~.5 second each run When it will be run:

  • When the user want to see their ranking
  • When the user want to see other people's ranking
  • Getting a list of user's friends

.5 second it's a really long time considering this query will be run pretty often. Is there anything I could do to optimize this query?

Table for user:

CREATE TABLE `user` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `firstname` varchar(100) DEFAULT NULL,
 `lastname` varchar(100) DEFAULT NULL,
 `password` varchar(20) NOT NULL,
 `email` varchar(300) NOT NULL,
 `verified` tinyint(10) DEFAULT NULL,
 `avatar` varchar(300) DEFAULT NULL,
 `points_total` int(11) unsigned NOT NULL DEFAULT '0',
 `points_today` int(11) unsigned NOT NULL DEFAULT '0',
 `number_correctanswer` int(11) unsigned NOT NULL DEFAULT '0',
 `number_watchedvideo` int(11) unsigned NOT NULL DEFAULT '0',
 `create_time` datetime NOT NULL,
 `type` tinyint(1) unsigned NOT NULL DEFAULT '1',
 `number_win` int(11) unsigned NOT NULL DEFAULT '0',
 `number_lost` int(11) unsigned NOT NULL DEFAULT '0',
 `number_tie` int(11) unsigned NOT NULL DEFAULT '0',
 `level` int(1) unsigned NOT NULL DEFAULT '0',
 `islogined` tinyint(1) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=230 DEFAULT CHARSET=utf8;

Table for user_quiz:

CREATE TABLE `user_quiz` (
 `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `user_id` int(11) NOT NULL,
 `question_id` int(11) NOT NULL,
 `is_answercorrect` int(11) unsigned NOT NULL DEFAULT '0',
 `question_answer_datetime` datetime NOT NULL,
 `score` int(1) DEFAULT NULL,
 `quarter` int(1) DEFAULT NULL,
 `game_type` int(1) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9816 DEFAULT CHARSET=utf8;

Table for user_starter:

CREATE TABLE `user_starter` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `user_id` int(11) DEFAULT NULL,
 `result` int(1) DEFAULT NULL,
 `created_date` date DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=456 DEFAULT CHARSET=utf8mb4;

My indexes:

Table: user
Table    Non_unique    Key_name    Seq_in_index    Column_name    Collation    Cardinality    Sub_part    Packed    Null    Index_type    Comment    Index_comment
user    0    PRIMARY    1    id    A    32                BTREE

Table: user_quiz
Table    Non_unique    Key_name    Seq_in_index    Column_name    Collation    Cardinality    Sub_part    Packed    Null    Index_type    Comment    Index_comment
user_quiz    0    PRIMARY    1    id    A    9462                BTREE
user_quiz    1    user_id    1    user_id    A    270                BTREE

Table: user_starter
Table    Non_unique    Key_name    Seq_in_index    Column_name    Collation    Cardinality    Sub_part    Packed    Null    Index_type    Comment    Index_comment
user_starter    0    PRIMARY    1    id    A    454                BTREE
user_starter    1    user_id    1    user_id    A    227            YES    BTREE

Query:

SET @curRank = 0;
SET @lastPlayerPoints = 0;
SELECT
  sub.*,
  @curRank := IF(@lastPlayerPoints!=points_week, @curRank + 1, @curRank) AS rank,
    @lastPlayerPoints := points_week AS db_PPW
FROM (
  SELECT u.id,u.firstname,u.lastname,u.email,u.avatar,u.type,u.points_total,u.number_win,u.number_lost,u.number_tie,u.verified,
    COALESCE(SUM(uq.score),0) as points_week,
    COALESCE(us.number_lost,0) as number_week_lost,
    COALESCE(us.number_win,0) as number_week_win,
    (select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 1) as lastFrdFight,
    (select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 2) as lastBotFight
  FROM `user` u
  LEFT JOIN (SELECT user_id,
    count(case when result=1 then 1 else null end) as number_win,
    count(case when result=-1 then 1 else null end) as number_lost
    from user_starter where created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27' ) us ON u.id = us.user_id
  LEFT JOIN (SELECT * FROM user_quiz WHERE question_answer_datetime BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00') uq on u.id = uq.user_id
  GROUP BY u.id ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub

EXPLAIN:

id    select_type    table    type    possible_keys    key    key_len    ref    rows    filtered    Extra
1    PRIMARY    <derived2>    ALL                    3027    100    
2    DERIVED    u    ALL    PRIMARY                32    100    Using temporary; Using filesort
2    DERIVED    <derived5>    ALL                    1    100    Using where; Using join buffer (Block Nested Loop)
2    DERIVED    <derived6>    ref    <auto_key0>    <auto_key0>    4    fancard.u.id    94    100    
6    DERIVED    user_quiz    ALL                    9461    100    Using where
5    DERIVED    user_starter    ALL                    454    100    Using where
4    DEPENDENT SUBQUERY    user_quiz    ref    user_id    user_id    4    func    35    100    Using where
3    DEPENDENT SUBQUERY    user_quiz    ref    user_id    user_id    4    func    35    100    Using where

Example output and expected output: enter image description here

Bench mark: around .5 second


Solution

  • The following index should make the subquery to user_quiz ultra fast.

    ALTER TABLE user_quiz
    ADD INDEX (`user_id`,`game_type`,`question_answer_datetime`)
    

    Please provide SHOW CREATE TABLE tablename statements for all tables, as that will help with additional optimizations.

    Update #1

    Alright, I've had some time to look things over, and fortunately there a appears to be a lot of relatively low hanging fruit in terms of optimization.

    Here are all the indexes to add:

    ALTER TABLE user_quiz
    ADD INDEX `userGametypeAnswerDatetimes` (`user_id`,`game_type`,`question_answer_datetime`)
    
    ALTER TABLE user_quiz
    ADD INDEX `userAnswerScores` (`user_id`,`question_answer_datetime`,`score`)
    
    ALTER TABLE user_starter
    ADD INDEX `userResultDates` (`user_id`,`result`,`created_date`)
    

    Note that the names (such as userGametypeAnswerDatetimes) are optional, and you can name them to whatever makes the most sense to you. But, in general, it's good to put specific names on your custom indexes (simply for organization purposes.)

    Now, here is your query that should work will with those new indexes:

    SET @curRank = 0;
    SET @lastPlayerPoints = 0;
    SELECT
    sub.*,
    @curRank := IF(@lastPlayerPoints!=points_week, @curRank + 1, @curRank) AS rank,
    @lastPlayerPoints := points_week AS db_PPW
    FROM (
        SELECT u.id,
        u.firstname,
        u.lastname,
        u.email,
        u.avatar,
        u.type,
        u.points_total,
        u.number_win,
        u.number_lost,
        u.number_tie,
        u.verified,
        COALESCE(user_scores.score,0) as points_week,
        COALESCE(user_losses.number_lost,0) as number_week_lost,
        COALESCE(user_wins.number_win,0) as number_week_win,
        (
            select MAX(question_answer_datetime) 
            from user_quiz 
            WHERE user_id = u.id and game_type = 1
        ) as lastFrdFight,
        (
            select MAX(question_answer_datetime) 
            from user_quiz 
            WHERE user_id = u.id 
            and game_type = 2
        ) as lastBotFight
        FROM `user` u
        LEFT OUTER JOIN (
            SELECT user_id,
            COUNT(*) AS number_won
            from user_starter 
            WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
            AND result = 1
            GROUP BY user_id
        ) user_wins
        ON user_wins.user_id = u.user_id
        LEFT OUTER JOIN (
            SELECT user_id,
            COUNT(*) AS number_lost
            from user_starter 
            WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
            AND result = -1
            GROUP BY user_id
        ) user_losses
        ON user_losses.user_id = u.user_id
        LEFT OUTER JOIN (
            SELECT SUM(score)
            FROM user_quiz 
            WHERE question_answer_datetime 
            BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00'
            GROUP BY user_id
        ) user_scores
        ON u.id = user_scores.user_id
        ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
    ) as sub
    

    Note: This is not necessarily the best result. It depends a LOT on your data set, as to whether this is necessarily the best, and sometimes you need to do a bit of trial and error.

    A hint as to what you can use trial and error on is the structure of how we query the lastFrdFight and lastBotFight verses how we query points_week, number_week_lost, number_week_win. All of these could either be done in the select statement (like the first two are in my query) or could be done by joining to a subquery result (like the last three do, in my query.)

    Mix and match to see what works best. In general, I've found the joining to a subquery to be fastest when you have a large number of rows in the outer query (in this case, querying the user table.) This is because it only needs to get the results once, and then can just match them up on a user by user basis. Other times, it can be better to have the query just in the SELECT clause - this will run MUCH faster, since there are more constants (the user_id is already known), but has to run for each row. So it's a trade off, and why you sometimes need to use trial and error.

    Why do the indexes work?

    So, you may be wondering why I made the indexes as I did. If you are familiar with phone books (in this age of smartphones, that's no longer a valid assumption I can make) then we can use that as an analogy:

    If you had a composite index of phonebookIndex (lastname,firstname,email) on your user table (example here! you don' actually need to add that index!) you would have a result similar to what a phone book provides. (Using email instead of phone number.)

    Each index is an internal copy of the data in the overall table. With this phonebookIndex there would internally be stored a list of all users with their lastname, then their first name, and then their email, and each of these would be ordered, just like a phone book.

    Why is that useful? Consider when you know someone's first and last name. You can quickly flip to where their last name is, then quickly go through that list of everyone with their last name, finding the first name you want, so obtaining the email.

    Indexes work in exactly the same way, in terms of how the database looks at them.

    Consider the userGametypeAnswerDatetimes index I defined above, and how we query that index in the lastFrdFight SELECT subquery.

    (
        select MAX(question_answer_datetime) 
        from user_quiz 
        WHERE user_id = u.id and game_type = 1
    ) as lastFrdFight
    

    Notice how we have both the user_id (from the outer query) and the game_type as constants. That is exactly like our example earlier, with having the first and last name, and wanting to look up an email/phone number. In this case, we are looking for the MAX of the 3rd value in the index. Still easy: All the values are ordered, so if this index was sitting in front of us, we could just flip to the specific user_id, then look at the section with all game_type=1 and then just pick the last value to find the maximum. Very very fast. Same for the database. It can find this value extremely fast, which is why you saw an 80%+ reduction in your overall query time.

    So, that's how indexes work, and why I choose these indexes as I did.

    Be aware, that the more indexes you have, the more you'll see slowdowns when doing inserts and updates. But, if you are reading a lot more from your tables than you are writing, this is usually a more than acceptable trade off.

    So, give these changes a shot, and let me know how it performs. Please provide the new EXPLAIN plan if you want further optimization help. Also, this should give you quite a bit of tools to use trial and error to see what does work at what doesn't. All my changes are fairly independent of each other, so you can swap them in and out with your original query pieces to see how each one works.