MySQL heavy disk activity even with no queries running

Trying to troubleshoot an issue with a mysterious disk io bottleneck caused by MySQL.

I'm using the following commands to test disk read/write speed:

#write
dd if=/dev/zero of=/tmp/writetest bs=1M count=1024 conv=fdatasync,notrunc

#read
echo 3 > /proc/sys/vm/drop_caches; dd if=/tmp/writetest of=/dev/null bs=1M count=1024

I rebooted the machine, disabled cron so none of my usual processes are running queries, killed the web server which usually runs, and killed mysqld.

When I run the read test without mysqld running, I get 1073741824 bytes (1.1 GB) copied, 2.19439 s, 489 MB/s. Consistently around 450-500 MB/s.

When I start back up the mysql service back up, then run the read test again, I get 1073741824 bytes (1.1 GB) copied, 135.657 s, 7.9 MB/s. Consistently around 5MB/s.

Running show full processlist in mysql doesn't show any queries (and I disabled everything that would be running queries anyway). In MySQLWorkbench's Server Status tab, I can see InnoDB reads fluctuate between 30-200 reads per second, and 3-15 writes per second even when no queries are running.

If I run iotop -oPa I can see that mysqld is racking up like 1MB disk reads per second when no queries are running. That seems like a lot considering no queries are running, but at the same time that doesn't seem like enough to cause my dd command to take so long... The only other thing performing disk io is jbd2/sda3-8.

Not sure if it's related, but if I try to kill the mysql server with service mysql stop it says "Attempt to stop MySQL timed out", and the mysqld process continues running, but I can no longer connect to the DB. I have to use kill -9 to kill the mysqld process and restart the server.

All of this appears to be out of the blue. This server was doing heavy duty log parsing, high volume inserts and selects for months, until this last weekend we started seeing this disk io bottleneck.

How can I find out why MySQL is doing so much disk reading when it's essentially idle?

Solution

This turned out, like many performance problems, to be a multifaceted issue.

Essentially the issue turned out to be with nightly system and db backups writing to a separate HDD raid array running into the next day, then the master sending FLUSH TABLES and causing mysql jobs and replication work to wait for that. In addition, an unnecessary side process copying many gigabytes of text files around the system a few times a day. Tons of context switching as the system was trying to copy data for backups while also performing mysql work (replication and other jobs).

I ended up reducing the number of tables we were replicating (some were unnecessary), reducing the copying of text files around the system when not needed, increasing memory and io allocated to the mysql server, streamlining the mysql backups and system backups, and limiting cron jobs running mysql processes to give the mysql backups more time to complete. With all that, the backups were barely completing by 7AM each morning, so I ended up determining that we need to run the mysql backups only on weekends instead of nightly, which is fine since this is all fairly static data.