I have a table with 19489578 postings. When I query table with a select statement, group by lockroomId field of the postings table, it run in very slow query times (>3 second). Here's the schema of the story table:
CREATE TABLE `Story` (
`storyId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`storyType` int(11) DEFAULT '0',
`parentStoryId` bigint(20) unsigned DEFAULT NULL,
`ownerId` bigint(20) unsigned NOT NULL,
`lockroomId` bigint(20) unsigned DEFAULT NULL,
`isHost` tinyint(1) DEFAULT '0',
`updatedAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`createdAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`message` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`imageUrl` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`amount` int(11) DEFAULT NULL,
`location` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`privacy` tinyint(1) NOT NULL DEFAULT '0',
`minutes` int(11) DEFAULT '0',
`hasCoinBadge` tinyint(1) DEFAULT '0',
`hasFriendBadge` tinyint(1) DEFAULT '0',
`localDBId` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`tagName` varchar(63) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
`tagColor` varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
`tagId` bigint(20) DEFAULT NULL,
`numLikes` int(11) DEFAULT '0',
`numComments` int(11) DEFAULT '0',
`lastCommenterId` bigint(20) unsigned DEFAULT NULL,
`lastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`lastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`secondLastCommenterId` bigint(20) unsigned DEFAULT NULL,
`secondLastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`secondLastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`lastLikerId` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`storyId`),
UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
KEY `ownerId` (`ownerId`),
KEY `parentStoryId` (`parentStoryId`),
KEY `updatedAt` (`updatedAt`),
KEY `ownerId_updatedAt` (`ownerId`,`updatedAt`),
KEY `updatedAt_ownerId_descending_index` (`updatedAt` DESC,`ownerId`),
KEY `lockroomId` (`lockroomId`),
CONSTRAINT `ownerId` FOREIGN KEY (`ownerId`) REFERENCES `User` (`userId`),
CONSTRAINT `parentStoryId` FOREIGN KEY (`parentStoryId`) REFERENCES `Story` (`storyId`)
) ENGINE=InnoDB AUTO_INCREMENT=19503337 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| storyId | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| storyType | int(11) | YES | | 0 | |
| parentStoryId | bigint(20) unsigned | YES | MUL | NULL | |
| ownerId | bigint(20) unsigned | NO | MUL | NULL | |
| lockroomId | bigint(20) unsigned | YES | MUL | NULL | |
| isHost | tinyint(1) | YES | | 0 | |
| updatedAt | timestamp(6) | YES | MUL | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| createdAt | timestamp(6) | YES | | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| message | text | YES | | NULL | |
| imageUrl | varchar(255) | YES | | NULL | |
| amount | int(11) | YES | | NULL | |
| location | varchar(45) | YES | | NULL | |
| privacy | tinyint(1) | NO | | 0 | |
| minutes | int(11) | YES | | 0 | |
| hasCoinBadge | tinyint(1) | YES | | 0 | |
| hasFriendBadge | tinyint(1) | YES | | 0 | |
| localDBId | text | YES | | NULL | |
| tagName | varchar(63) | YES | | | |
| tagColor | varchar(15) | YES | | | |
| tagId | bigint(20) | YES | | NULL | |
| numLikes | int(11) | YES | | 0 | |
| numComments | int(11) | YES | | 0 | |
| lastCommenterId | bigint(20) unsigned | YES | | NULL | |
| lastCommentTimestamp | timestamp(6) | YES | | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| lastComment | varchar(256) | YES | | NULL | |
| secondLastCommenterId | bigint(20) unsigned | YES | | NULL | |
| secondLastCommentTimestamp | timestamp(6) | YES | | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| secondLastComment | varchar(256) | YES | | NULL | |
| lastLikerId | bigint(20) unsigned | YES | | NULL | |
+----------------------------+---------------------+------+-----+----------------------+-------------------+
I also add lockroomId as Index. Here is the index of Story Table:
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story | 0 | PRIMARY | 1 | storyId | A | 18951234 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 0 | StoryId_UNIQUE | 1 | storyId | A | 19116608 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId | 1 | ownerId | A | 1212686 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | parentStoryId | 1 | parentStoryId | A | 1 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt | 1 | updatedAt | A | 19022172 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 1 | ownerId | A | 945172 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 2 | updatedAt | A | 19112336 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 1 | updatedAt | D | 19091560 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 2 | ownerId | A | 18649716 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | lockroomId | 1 | lockroomId | A | 1 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
And here's the query that takes >3 second to execute. What this query do is to get the jointStory that user create together with same lockroomId but different storyId. As a result, I group the story by lockroomId.
SELECT
*
FROM
((SELECT
Story.*,
jointStory.guestIds,
jointStory.guestStoryIds,
jointStory.guestTrees,
jointStory.guestDurations,
jointStory.guestTagIds
FROM
Story
INNER JOIN (SELECT
lockroomId,
GROUP_CONCAT(Story.ownerId) AS guestIds,
GROUP_CONCAT(Story.storyId) AS guestStoryIds,
GROUP_CONCAT(Story.imageUrl) AS guestTrees,
GROUP_CONCAT(Story.minutes) AS guestDurations,
GROUP_CONCAT(Story.tagId) AS guestTagIds
FROM
Story
WHERE
Story.storyType = 1
AND lockroomId IS NOT NULL
AND createdAt < FROM_UNIXTIME(1600444809)
GROUP BY Story.lockroomId
HAVING FIND_IN_SET(1349147, guestIds) ORDER BY createdAt DESC
LIMIT 10) jointStory ON jointStory.lockroomId = Story.lockroomId
WHERE
Story.storyType = 2
AND Story.lockroomId IS NOT NULL
AND Story.createdAt < FROM_UNIXTIME(1600444809)
ORDER BY createdAt DESC
LIMIT 10) UNION ALL SELECT
Story.*, NULL, NULL, NULL, NULL, NULL
FROM
Story
WHERE
Story.storyType = 0
AND ownerId = 1349147
AND createdAt < FROM_UNIXTIME(1600444809)
ORDER BY createdAt DESC
LIMIT 10) stories
LEFT JOIN
StoryEngagement ON StoryEngagement.storyId = stories.storyId
AND StoryEngagement.userId = 1349147
ORDER BY createdAt DESC
LIMIT 10;
The result of the explain extended command on this query shows that MySQL is using filesort and using index condition:
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
| 1 | PRIMARY | Story | NULL | range | lockroomId | lockroomId | 9 | NULL | 287540 | 3.33 | Using index condition; Using where; Using MRR; Using filesort |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 9 | flora_test.Story.lockroomId | 10 | 100.00 | NULL |
| 2 | DERIVED | Story | NULL | range | lockroomId | lockroomId | 9 | NULL | 287540 | 3.33 | Using index condition; Using where |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
It would be very appreciate if you have any ideas on how to optimize this query. It would be great if there is a way that do not need to create index and make this query run faster, because this is a huge production table and adding index take very long time, also adding some new index may cause other query run slower. I have already tried a few things such as creating a combined index on group by fields( index(createdAt,lockroomId)
) and using FORCE INDEX(createdAt,lockroomId)
as suggested by a few blog postings, but run even slower...
UPDATE
After applied the suggestion, here is the result of EXPLAIN SELECE
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 12 | 100.00 | Using filesort |
| 1 | PRIMARY | StoryEngagement | NULL | eq_ref | PRIMARY | PRIMARY | 16 | const,stories.storyId | 1 | 100.00 | Using where |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | Story | NULL | ref | lockroomId,combine_lockroomId_Index,combine_story_index | combine_story_index | 14 | const,jointStory.lockroomId | 29 | 33.33 | Using index condition |
| 3 | DERIVED | Story | NULL | range | lockroomId,combine_lockroomId_Index,combine_story_index | combine_lockroomId_Index | 14 | NULL | 36332 | 33.33 | Using index condition; Using where; Using temporary; Using filesort |
| 4 | UNION | Story | NULL | ref | ownerId,ownerId_updatedAt,combine_lockroomId_Index,combine_story_index | ownerId | 8 | const | 15 | 16.66 | Using where |
| NULL | UNION RESULT | <union2,4> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
Here is the result of SHOW TABLE STATUS
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Story | InnoDB | 10 | Dynamic | 19350090 | 201 | 3895443456 | 0 | 4337303552 | 4194304 | 20043428 | 2020-09-18 17:03:14 | NULL | NULL | utf8mb4_unicode_ci | NULL | | |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
The size of RAM is 2 GB. Here is the innodb_buffer_pool_size
+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
| 1.000000000000 |
+------------------------------------------+
Change the type of column may be a good idea, however as I know the table will be lock when altering the type of column. As a result I may need to wait until the next DB update.
Here is the current Index set:
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story | 0 | PRIMARY | 1 | storyId | A | 19350090 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 0 | StoryId_UNIQUE | 1 | storyId | A | 19350090 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId | 1 | ownerId | A | 1255716 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | parentStoryId | 1 | parentStoryId | A | 1 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt | 1 | updatedAt | A | 19350090 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 1 | ownerId | A | 1743714 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 2 | updatedAt | A | 19350090 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 1 | updatedAt | D | 19139564 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 2 | ownerId | A | 18636190 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | lockroomId | 1 | lockroomId | A | 328860 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_lockroomId_Index | 1 | storyType | A | 685 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_lockroomId_Index | 2 | lockroomId | A | 221470 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 1 | storyType | A | 429 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 2 | lockroomId | A | 665208 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 3 | createdAt | A | 19350090 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 4 | ownerId | A | 19350090 | NULL | NULL | | BTREE | | | YES | NULL |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
The Number of Like is only a number(Int) that store in the column. And the comment column which only store the lastComment and secondLastComment. We have create another Table Story
to store all the comments.
updatedAt
and createdAt
is to check when the user post a story.
Very appreciate for your reply.
I think that lockroomId IS NOT NULL
is unnecessary because of lockroomId > 0
Indexes:
(storyType, lockroomId)
(storyType, createdAt)
Redundant (because the PRIMARY KEY
is a UNIQUE KEY
):
UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
Does each lockroomId
have both a storyType
= 1 and = 2? If so, I want to do this query in 3 steps (instead of 2):
JOIN
Story to get Story.*
JOIN
again to get the rest of the GROUP_CONCATs
Here's the rationale.
First you are fetching the entire table, building GROUP_CONCATs
, but all but 10 will eventually be thrown away. Instead, I hope to get only guestIds and filter on . Second, whittle it down to 10 rows.
Then I would go back in to get the rest of the stuff.
Does this get the right list of loockroomIds?
SELECT lockroomId, GROUP_CONCAT(Story.ownerId) AS guestIds
FROM Story
WHERE storyType = 1
AND lockroomId > 0
AND createdAt < FROM_UNIXTIME(1598882406)
GROUP BY lockroomId
HAVING FIND_IN_SET(<userId>, guestIds)
ORDER BY createdAt DESC
LIMIT 10;
With this covering index:
INDEX(storyType, lookroomId, createdAt, ownerId)
Before I continue, please see how fast that runs and whether it gets a valid list of lockroomIds.
Improved
You say that helped. Please provide EXPLAIN SELECT
after the index change.
Secondary suggestions
Your table is rather large. Please provide SHOW TABLE STATUS
, size of RAM, and innodb_buffer_pool_size
. If possible, please check whether MySQL is swapping. This will give a clue of whether you are I/O-bound, leading to a shift in focus.
I see a lot of BIGINTs
. BIGINT
takes 8 bytes; INT
takes 4. Shrinking the data size helps performance some if I/O is involved.
Do you always need to display the "comments"? Perhaps the 6 columns can be moved to a separate table?
Moving "numlikes" to another table -- This may be beneficial, especially if you are frequently updating that column while doing the big select in question.
Consider smaller datatypes for things like minutes
. SMALLINT UNSIGNED
can hold 2 months? (It's only 2 bytes.)
Do you actually use created_at
and updated_at
? (`TIMESTAMP(6) takes 8 bytes each.)
What is the current set of indexes? Let's see the second-worst query. (I don't want to wreck it while improving the current query.)
More
innodb_buffer_pool_size = 1G
in only 2GB of RAM -- Sounds like you might be swapping. Swapping is very bad for performance. I suggest lowering that setting to 500M.
The advice of 60-80% applies to larger servers. The OS takes some space. MySQL code takes some space. Other data structures take some space. And the web server. And your application code. Those may add up to more than 1GB. (I don't know exactly.) That leaves less than 1GB for the buffer pool. Set it to 500M, run for a few days, then if there is some spare RAM, you can increase the buffer_pool some.
Swapping is worse for performance than having a smaller buffer_pool.
Since the table is about 8GB, it is important to shrink the table. See suggestions above.
Since combine_lockroomId_Index
is the first part of combine_story_index
, DROP combine_lockroomId_Index
.