Search code examples
mysqlamazon-rds

simple update query taking very long to execute in MySQL (wait innodb)


I have a table with 54k rows, contains 10G of data

I am running this update query on it:

UPDATE my_table SET blog_object_version='19'

takes more than 1 hour to run,
how can I improve performance?

additional information:

I am running on AMAZON rds, db.m5.4xlarge
this is my instance: enter image description here

this is what I see in the aws performance insights:
wait/io/file/innodb/innodb_data_file

enter image description here

I do not have any other queries running on my db:

mysql> show processlist;
+----+----------+---------------------+----------+---------+------+----------+----------------------------------------------+
| Id | User     | Host                | db       | Command | Time | State    | Info                                         |
+----+----------+---------------------+----------+---------+------+----------+----------------------------------------------+
|  3 | rdsadmin | localhost:65182     | NULL     | Sleep   |    0 |          | NULL                                         |
|  4 | rdsadmin | localhost           | NULL     | Sleep   |    1 |          | NULL                                         |
|  6 | admin    | 123.45.67.890:6170  | my_table | Query   | 3901 | updating | UPDATE my_table SET blog_object_version='19' |
| 12 | admin    | 123.45.67.890:6360  | NULL     | Sleep   | 2981 |          | NULL                                         |
| 18 | admin    | 123.45.67.890:7001  | NULL     | Query   |    0 | starting | show processlist                             |
+----+----------+---------------------+----------+---------+------+----------+----------------------------------------------+

and this is my table:

mysql> show create table my_table\G;
*************************** 1. row ***************************
       Table: my_table
Create Table: CREATE TABLE `my_table` (
  `index` int(11) NOT NULL AUTO_INCREMENT,
  `id` varchar(100) DEFAULT NULL,
  `user_id` varchar(50) NOT NULL,
  `associate_object_id` varchar(50) NOT NULL,
  `type` varchar(50) DEFAULT NULL,
  `creation_date` datetime DEFAULT NULL,
  `version_id` varchar(50) NOT NULL,
  `blog_object` longtext,
  `blog_object_version` varchar(100) DEFAULT NULL,
  `last_update` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`index`),
  UNIQUE KEY `id_user_id_version_id` (`id`,`user_id`,`version_id`) USING BTREE,
  KEY `user_id_associate_object_id` (`user_id`,`associate_object_id`),
  KEY `user_id_associate_object_id_version_id` (`user_id`,`associate_object_id`,`version_id`)
) ENGINE=InnoDB AUTO_INCREMENT=54563151 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

these are my indexes:

mysql> SHOW INDEX FROM my_table;
+----------+------------+----------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name                               | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| my_table |          0 | PRIMARY                                |            1 | index               | A         |       43915 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          0 | id_user_id_version_id                  |            1 | id                  | A         |        3659 |     NULL | NULL   | YES  | BTREE      |         |               |
| my_table |          0 | id_user_id_version_id                  |            2 | user_id             | A         |        8783 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          0 | id_user_id_version_id                  |            3 | version_id          | A         |       43915 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          1 | user_id_associate_object_id            |            1 | user_id             | A         |         378 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          1 | user_id_associate_object_id            |            2 | associate_object_id | A         |        4391 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          1 | user_id_associate_object_id_version_id |            1 | user_id             | A         |         385 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          1 | user_id_associate_object_id_version_id |            2 | associate_object_id | A         |        6273 |     NULL | NULL   |      | BTREE      |         |               |
| my_table |          1 | user_id_associate_object_id_version_id |            3 | version_id          | A         |       43915 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Solution

  • Very basic issue, with a very basic solution:

    INDEX(blog_object_version)
    

    Why? Without this index, the UPDATE must read every one of the 54K (or 54M?) rows to check for '19'.

    With that index, only the relevant rows need to be read.

    Tips:

    Many of the VARCHAR columns sound like they should be INT (or something smaller, like SMALLINT)? (Changing the types is not likely to speed up the query.)

    Toss user_id_associate_object_id; the index user_id_associate_object_id_version_id handles the same things.

    Update all rows

    Updating up to 1K rows is reasonable. Updating less than 20% of the table will probably use the index if it is suitable.

    But... If you need to update all of 54K rows, there are a couple of issues.

    It will take a long time and probably a lot of disk space because both the old and new copies are held on to until the Update is finished. (This is so that it can commit or rollback the entire Update atomically.)

    Generally, it is "poor design" to ever need to update a column in all rows of an entire table. Sometimes, it may be possible put the column in another table in a single row. Then it is a one-row query to update blog_object_version. But it means doing a JOIN when you need it in a SELECT. (This may not be a problem.) And if you aren't changing all the rows, then it is messier.

    So,... If you decide to update "a lot of" (or all of) a big table, I recommend doing it in chunks of 100-1000 rows each. More details: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks

    Change Buffer

    Another issue (less important) is that when updating an a non-unique indexed column, the index needs updating. That requires modifications to the BTree that represents the INDEX. For non-unique indexes, this is done in the background, mostly after the query is commited.

    There is no risk of having an incorrect index before it finishes updating the BTree. This is because of the "Change buffer". That construct keeps pending index updates for later persisting to disk.