Search code examples
mysqlamazon-rdsdatabase-performancedatabase-replication

Mysqldump on RDS Read Replica Slave is 50x slower


I created a read-replica of my MySQL database on amazon RDS.

When executing the following command, it is super fast (half a second) on the master, but takes more like 30 seconds on the slave. Super annoying because I wanted to dump off of the slave so that I don't slow down the master.

mysqldump --set-gtid-purged=OFF -h myDomain.com -u dev -pmyPassword mySchema > out.sql

Solution

  • There are three issues to consider.

    The most significant is that mysqldump does not perform well when run at a distance from the database, due to limitations in the traditional MySQL client/server wire protocol, which makes no allowance for pipelining a series of commands.

    The mysqldump utility uses no magic to generate dump files -- it issues SQL statements to the server, and takes the results of those queries to generate its output.

    As a result, every single object (schema, table, view, stored function/procedure, event) in the database requires at least one round trip and sometimes more than one.

    For each table, mysqldump first issues SHOW CREATE TABLE t1; followed by SELECT * FROM t1; ... so a round trip time of 100 ms would mean that extracting a dump file of 150 tables would mean 150 × 2 × 0.100 = 30 seconds are simply wasted by the distance between the machine running mysqldump and the server -- and this is true even if the tables are completely empty.

    This is not a recommendation, but you might take a look at mydumper, which claims to have the ability of creating the backup using multiple database connections, in parallel, and this could help mediate the cycles wasted as commands pass to the server and return to the client, by parallelizing the dump process. I don't know the quality of this code base, but something like this could help.


    Next, you almost always want to use the --compress option for mysqldump. Contrary to what you might assume, this does not compress the backup file. The generated backup file is identical when this option is used, but when this feature is activated, the server compresses the data it sends to mysqldump on the wire, and mysqldump decompresses the data again before writing it out -- so this option will almost always make for a faster process unless the machine running mysqldump and the database server are connected by a low-latency, high-bandwidth network. Because the generated file is identical, there are no compatibility concerns when using this option.


    Finally, there's an issue with newly-created RDS servers that you need to be aware of, so that it doesn't skew your benchmarks. When you create an RDS replica, it is originally seeded with data from a snapshot of the upstream master. This is, behind the scenes, an EBS snapshot of the master's hard drive, and the new database instance is backed by an EBS volume restored from that snapshot. EBS volumes are lazily-loaded from the snapshot, so they have a documented first-touch penalty. This issue could have a substantial impact on the performance of the first complete backup, but should have no meaningful impact after that.