Search code examples
mysqlsqlquery-optimization

MYSQL: Optimizing MySQL query with group by and inner join


I have a table with 19489578 postings. When I query table with a select statement, group by lockroomId field of the postings table, it run in very slow query times (>3 second). Here's the schema of the story table:

CREATE TABLE `Story` (
  `storyId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `storyType` int(11) DEFAULT '0',
  `parentStoryId` bigint(20) unsigned DEFAULT NULL,
  `ownerId` bigint(20) unsigned NOT NULL,
  `lockroomId` bigint(20) unsigned DEFAULT NULL,
  `isHost` tinyint(1) DEFAULT '0',
  `updatedAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `createdAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `message` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `imageUrl` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `amount` int(11) DEFAULT NULL,
  `location` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `privacy` tinyint(1) NOT NULL DEFAULT '0',
  `minutes` int(11) DEFAULT '0',
  `hasCoinBadge` tinyint(1) DEFAULT '0',
  `hasFriendBadge` tinyint(1) DEFAULT '0',
  `localDBId` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `tagName` varchar(63) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
  `tagColor` varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
  `tagId` bigint(20) DEFAULT NULL,
  `numLikes` int(11) DEFAULT '0',
  `numComments` int(11) DEFAULT '0',
  `lastCommenterId` bigint(20) unsigned DEFAULT NULL,
  `lastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `lastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `secondLastCommenterId` bigint(20) unsigned DEFAULT NULL,
  `secondLastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `secondLastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `lastLikerId` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`storyId`),
  UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
  KEY `ownerId` (`ownerId`),
  KEY `parentStoryId` (`parentStoryId`),
  KEY `updatedAt` (`updatedAt`),
  KEY `ownerId_updatedAt` (`ownerId`,`updatedAt`),
  KEY `updatedAt_ownerId_descending_index` (`updatedAt` DESC,`ownerId`),
  KEY `lockroomId` (`lockroomId`),
  CONSTRAINT `ownerId` FOREIGN KEY (`ownerId`) REFERENCES `User` (`userId`),
  CONSTRAINT `parentStoryId` FOREIGN KEY (`parentStoryId`) REFERENCES `Story` (`storyId`)
) ENGINE=InnoDB AUTO_INCREMENT=19503337 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| Field                      | Type                | Null | Key | Default              | Extra             |
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| storyId                    | bigint(20) unsigned | NO   | PRI | NULL                 | auto_increment    |
| storyType                  | int(11)             | YES  |     | 0                    |                   |
| parentStoryId              | bigint(20) unsigned | YES  | MUL | NULL                 |                   |
| ownerId                    | bigint(20) unsigned | NO   | MUL | NULL                 |                   |
| lockroomId                 | bigint(20) unsigned | YES  | MUL | NULL                 |                   |
| isHost                     | tinyint(1)          | YES  |     | 0                    |                   |
| updatedAt                  | timestamp(6)        | YES  | MUL | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| createdAt                  | timestamp(6)        | YES  |     | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| message                    | text                | YES  |     | NULL                 |                   |
| imageUrl                   | varchar(255)        | YES  |     | NULL                 |                   |
| amount                     | int(11)             | YES  |     | NULL                 |                   |
| location                   | varchar(45)         | YES  |     | NULL                 |                   |
| privacy                    | tinyint(1)          | NO   |     | 0                    |                   |
| minutes                    | int(11)             | YES  |     | 0                    |                   |
| hasCoinBadge               | tinyint(1)          | YES  |     | 0                    |                   |
| hasFriendBadge             | tinyint(1)          | YES  |     | 0                    |                   |
| localDBId                  | text                | YES  |     | NULL                 |                   |
| tagName                    | varchar(63)         | YES  |     |                      |                   |
| tagColor                   | varchar(15)         | YES  |     |                      |                   |
| tagId                      | bigint(20)          | YES  |     | NULL                 |                   |
| numLikes                   | int(11)             | YES  |     | 0                    |                   |
| numComments                | int(11)             | YES  |     | 0                    |                   |
| lastCommenterId            | bigint(20) unsigned | YES  |     | NULL                 |                   |
| lastCommentTimestamp       | timestamp(6)        | YES  |     | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| lastComment                | varchar(256)        | YES  |     | NULL                 |                   |
| secondLastCommenterId      | bigint(20) unsigned | YES  |     | NULL                 |                   |
| secondLastCommentTimestamp | timestamp(6)        | YES  |     | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| secondLastComment          | varchar(256)        | YES  |     | NULL                 |                   |
| lastLikerId                | bigint(20) unsigned | YES  |     | NULL                 |                   |
+----------------------------+---------------------+------+-----+----------------------+-------------------+

I also add lockroomId as Index. Here is the index of Story Table:

+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name                           | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story |          0 | PRIMARY                            |            1 | storyId       | A         |    18951234 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          0 | StoryId_UNIQUE                     |            1 | storyId       | A         |    19116608 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId                            |            1 | ownerId       | A         |     1212686 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | parentStoryId                      |            1 | parentStoryId | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt                          |            1 | updatedAt     | A         |    19022172 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            1 | ownerId       | A         |      945172 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            2 | updatedAt     | A         |    19112336 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            1 | updatedAt     | D         |    19091560 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            2 | ownerId       | A         |    18649716 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | lockroomId                         |            1 | lockroomId    | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

And here's the query that takes >3 second to execute. What this query do is to get the jointStory that user create together with same lockroomId but different storyId. As a result, I group the story by lockroomId.

SELECT 
    *
FROM
    ((SELECT 
        Story.*,
            jointStory.guestIds,
            jointStory.guestStoryIds,
            jointStory.guestTrees,
            jointStory.guestDurations,
            jointStory.guestTagIds
    FROM
        Story
    INNER JOIN (SELECT 
        lockroomId,
            GROUP_CONCAT(Story.ownerId) AS guestIds,
            GROUP_CONCAT(Story.storyId) AS guestStoryIds,
            GROUP_CONCAT(Story.imageUrl) AS guestTrees,
            GROUP_CONCAT(Story.minutes) AS guestDurations,
            GROUP_CONCAT(Story.tagId) AS guestTagIds
    FROM
        Story
    WHERE
        Story.storyType = 1
            AND lockroomId IS NOT NULL
            AND createdAt < FROM_UNIXTIME(1600444809)
    GROUP BY Story.lockroomId
    HAVING FIND_IN_SET(1349147, guestIds) ORDER BY createdAt DESC
    LIMIT 10) jointStory ON jointStory.lockroomId = Story.lockroomId
    WHERE
        Story.storyType = 2
            AND Story.lockroomId IS NOT NULL
            AND Story.createdAt < FROM_UNIXTIME(1600444809)
    ORDER BY createdAt DESC
    LIMIT 10) UNION ALL SELECT 
        Story.*, NULL, NULL, NULL, NULL, NULL
    FROM
        Story
    WHERE
        Story.storyType = 0
            AND ownerId = 1349147
            AND createdAt < FROM_UNIXTIME(1600444809)
    ORDER BY createdAt DESC
    LIMIT 10) stories
        LEFT JOIN
    StoryEngagement ON StoryEngagement.storyId = stories.storyId
        AND StoryEngagement.userId = 1349147
ORDER BY createdAt DESC
LIMIT 10;

The result of the explain extended command on this query shows that MySQL is using filesort and using index condition:

+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                         | rows   | filtered | Extra                                                         |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
|  1 | PRIMARY     | Story      | NULL       | range | lockroomId    | lockroomId  | 9       | NULL                        | 287540 |     3.33 | Using index condition; Using where; Using MRR; Using filesort |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 9       | flora_test.Story.lockroomId |     10 |   100.00 | NULL                                                          |
|  2 | DERIVED     | Story      | NULL       | range | lockroomId    | lockroomId  | 9       | NULL                        | 287540 |     3.33 | Using index condition; Using where                            |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+

It would be very appreciate if you have any ideas on how to optimize this query. It would be great if there is a way that do not need to create index and make this query run faster, because this is a huge production table and adding index take very long time, also adding some new index may cause other query run slower. I have already tried a few things such as creating a combined index on group by fields( index(createdAt,lockroomId)) and using FORCE INDEX(createdAt,lockroomId) as suggested by a few blog postings, but run even slower...

UPDATE

After applied the suggestion, here is the result of EXPLAIN SELECE

+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
| id | select_type  | table           | partitions | type   | possible_keys                                                          | key                      | key_len | ref                         | rows  | filtered | Extra                                                               |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
|  1 | PRIMARY      | <derived2>      | NULL       | ALL    | NULL                                                                   | NULL                     | NULL    | NULL                        |    12 |   100.00 | Using filesort                                                      |
|  1 | PRIMARY      | StoryEngagement | NULL       | eq_ref | PRIMARY                                                                | PRIMARY                  | 16      | const,stories.storyId       |     1 |   100.00 | Using where                                                         |
|  2 | DERIVED      | <derived3>      | NULL       | ALL    | NULL                                                                   | NULL                     | NULL    | NULL                        |    10 |   100.00 | Using where; Using temporary; Using filesort                        |
|  2 | DERIVED      | Story           | NULL       | ref    | lockroomId,combine_lockroomId_Index,combine_story_index                | combine_story_index      | 14      | const,jointStory.lockroomId |    29 |    33.33 | Using index condition                                               |
|  3 | DERIVED      | Story           | NULL       | range  | lockroomId,combine_lockroomId_Index,combine_story_index                | combine_lockroomId_Index | 14      | NULL                        | 36332 |    33.33 | Using index condition; Using where; Using temporary; Using filesort |
|  4 | UNION        | Story           | NULL       | ref    | ownerId,ownerId_updatedAt,combine_lockroomId_Index,combine_story_index | ownerId                  | 8       | const                       |    15 |    16.66 | Using where                                                         |
| NULL | UNION RESULT | <union2,4>      | NULL       | ALL    | NULL                                                                   | NULL                     | NULL    | NULL                        |  NULL |     NULL | Using temporary; Using filesort                                     |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+

Here is the result of SHOW TABLE STATUS

+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name  | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation          | Checksum | Create_options | Comment |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Story | InnoDB |      10 | Dynamic    | 19350090 |            201 |  3895443456 |               0 |   4337303552 |   4194304 |       20043428 | 2020-09-18 17:03:14 | NULL        | NULL       | utf8mb4_unicode_ci |     NULL |                |         |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+

The size of RAM is 2 GB. Here is the innodb_buffer_pool_size

+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
|                           1.000000000000 |
+------------------------------------------+

Change the type of column may be a good idea, however as I know the table will be lock when altering the type of column. As a result I may need to wait until the next DB update.

Here is the current Index set:

+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name                           | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story |          0 | PRIMARY                            |            1 | storyId       | A         |    19350090 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          0 | StoryId_UNIQUE                     |            1 | storyId       | A         |    19350090 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId                            |            1 | ownerId       | A         |     1255716 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | parentStoryId                      |            1 | parentStoryId | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt                          |            1 | updatedAt     | A         |    19350090 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            1 | ownerId       | A         |     1743714 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            2 | updatedAt     | A         |    19350090 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            1 | updatedAt     | D         |    19139564 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            2 | ownerId       | A         |    18636190 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | lockroomId                         |            1 | lockroomId    | A         |      328860 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_lockroomId_Index           |            1 | storyType     | A         |         685 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_lockroomId_Index           |            2 | lockroomId    | A         |      221470 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            1 | storyType     | A         |         429 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            2 | lockroomId    | A         |      665208 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            3 | createdAt     | A         |    19350090 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            4 | ownerId       | A         |    19350090 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

The Number of Like is only a number(Int) that store in the column. And the comment column which only store the lastComment and secondLastComment. We have create another Table Story to store all the comments.

updatedAt and createdAt is to check when the user post a story.

Very appreciate for your reply.


Solution

  • I think that lockroomId IS NOT NULL is unnecessary because of lockroomId > 0

    Indexes:

    (storyType, lockroomId)
    (storyType, createdAt)
    

    Redundant (because the PRIMARY KEY is a UNIQUE KEY):

    UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
    

    Does each lockroomId have both a storyType = 1 and = 2? If so, I want to do this query in 3 steps (instead of 2):

    1. Do the minimal amount of work to find the 10 lookroomIDs.
    2. JOIN Story to get Story.*
    3. JOIN again to get the rest of the GROUP_CONCATs

    Here's the rationale.

    First you are fetching the entire table, building GROUP_CONCATs, but all but 10 will eventually be thrown away. Instead, I hope to get only guestIds and filter on . Second, whittle it down to 10 rows.

    Then I would go back in to get the rest of the stuff.

    Does this get the right list of loockroomIds?

    SELECT  lockroomId, GROUP_CONCAT(Story.ownerId) AS guestIds
        FROM  Story
        WHERE  storyType = 1
          AND  lockroomId > 0
           AND  createdAt < FROM_UNIXTIME(1598882406)
        GROUP BY  lockroomId
        HAVING  FIND_IN_SET(<userId>, guestIds)
        ORDER BY  createdAt DESC
        LIMIT  10;
    

    With this covering index:

    INDEX(storyType, lookroomId, createdAt, ownerId)
    

    Before I continue, please see how fast that runs and whether it gets a valid list of lockroomIds.

    Improved

    You say that helped. Please provide EXPLAIN SELECT after the index change.

    Secondary suggestions

    Your table is rather large. Please provide SHOW TABLE STATUS, size of RAM, and innodb_buffer_pool_size. If possible, please check whether MySQL is swapping. This will give a clue of whether you are I/O-bound, leading to a shift in focus.

    I see a lot of BIGINTs. BIGINT takes 8 bytes; INT takes 4. Shrinking the data size helps performance some if I/O is involved.

    Do you always need to display the "comments"? Perhaps the 6 columns can be moved to a separate table?

    Moving "numlikes" to another table -- This may be beneficial, especially if you are frequently updating that column while doing the big select in question.

    Consider smaller datatypes for things like minutes. SMALLINT UNSIGNED can hold 2 months? (It's only 2 bytes.)

    Do you actually use created_at and updated_at? (`TIMESTAMP(6) takes 8 bytes each.)

    What is the current set of indexes? Let's see the second-worst query. (I don't want to wreck it while improving the current query.)

    More

    innodb_buffer_pool_size = 1G in only 2GB of RAM -- Sounds like you might be swapping. Swapping is very bad for performance. I suggest lowering that setting to 500M.

    The advice of 60-80% applies to larger servers. The OS takes some space. MySQL code takes some space. Other data structures take some space. And the web server. And your application code. Those may add up to more than 1GB. (I don't know exactly.) That leaves less than 1GB for the buffer pool. Set it to 500M, run for a few days, then if there is some spare RAM, you can increase the buffer_pool some.

    Swapping is worse for performance than having a smaller buffer_pool.

    Since the table is about 8GB, it is important to shrink the table. See suggestions above.

    Since combine_lockroomId_Index is the first part of combine_story_index, DROP combine_lockroomId_Index.