Search code examples
mysqlinnodb

MySQL table size much larger than size of data in it, causing index scanning queries to be slow


I have a table which is inserted into and deleted from at a high rate. The number of rows is never larger than a few hundred and I wouldn't expect it to be larger than a few megabytes, but the table size is 20 gigabytes and growing. It's using MySQL 5.6.35, and the engine is InnoDB. I would expect the inserted rows to be inserted in the space left by the deleted rows, but this doesn't appear to be the case. It grows to this level within a week, and at some point it seems some background process starts, and slowly reduces it by a few gigabytes.

The problem with the table being so large is it causes any queries that scan the whole table like count(*) run very slowly.

Is there something I can change to prevent the table growing like this?


Solution

  • InnoDB does free space as you delete, but not instantly. It quickly marks the records as deleted, but the space is marked free later by a purge thread. If you continue to insert more data rapidly, you likely are causing InnoDB to expand the tablespace, at least part of the time, because you're trying to reuse the space from deleted rows before they've been purged.

    For example read:

    If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow. In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable. See Section 14.14, “InnoDB Startup Options and System Variables” for more information.

    I have never encountered any site that set innodb_max_purge_lag, because it means their continued writes to the database could sometimes be slowed down. They don't usually want that.

    Instead, what I've done is set innodb_purge_threads=4 instead of the default 1. That helps the purge act quickly when it runs.

    The purging is further delayed if your app keeps transactions open that need to hold onto those deleted rows to satisfy their REPEATABLE-READ view of the database. It doesn't matter how fast the purge is or how many threads you have to do the purge, if purging is prevented by open transactions' snapshot requirements.

    You should commit transactions promptly, and not leave them open for indefinite amounts of time. Or change your transaction isolation level to READ-COMMITTED so your transactions don't need the database to retain old record versions.