Search code examples
mysqldatabase-performance

Is MySQL optimize table request safe to use during write operations and in replication enviorement?


I have MySQL DB with up to 300 InnoDB tables with similar structure each has about 700k rows and size around 160M in normal case. This tables are independent of each other and has no foreign keys.

This tables are intensively used for RW operations the table usage pattern looks like this:

  1. Every 15 minutes new portion of data is inserted (300-1200 new rows) into each table
  2. Once per day the data which is older then 30 days is removed from the each table.
  3. Periodically data read from the random tables per user request.

After a few month the DB performance has dropped significantly. After investigation the problem was found: the table files has grown large and occupied about 300M of disk space each. After running manual table optimization the problem was solved but in a few weeks the DB performance started to degrade again.

As a solution to keep DB healthy the OPTIMIZE TABLE request was added after each data removal.

The questions are:

  1. Is there other solutions for MySQL performance degradation problem?
  2. Is it possible to lost data inserted into the table when OPTIMIZE TABLE is running (if INSERT request executed when OPTIMIZE TABLE request is still in progress)?
  3. Is OPTIMIZE TABLE request safe for unexpected MySQL server shutdowns (is it possible to lost commited data if PC power loss happens in case of INSERT operation was finished but OPTIMIZE TABLE request not)?
  4. Is OPTIMIZE TABLE request safe to use in case of simple replication (OPTIMIZE TABLE request executed on master server and replicated on slave which is used only as a backup facility and has no IO except of replication)?

Edit: MySQL 5.7.15, InnoDB tables. And replication environment is build using MySQL 8.0.4 RC.

Edit 2: table structure:

CREATE TABLE `data_2235353676` (
`id` BIGINT(20) NOT NULL,
`inst` VARCHAR(100) NULL DEFAULT NULL,
`if_i` BIGINT(20) NOT NULL,
`prt` BIGINT(20) NULL DEFAULT NULL,
`if_t` BIGINT(20) NULL DEFAULT NULL,
`path` BIGINT(20) NULL DEFAULT NULL,
`period` BIGINT(20) NOT NULL,
`type` BIGINT(20) NOT NULL,
`servicetype` INT(11) NOT NULL,
`sdv_time` BIGINT(20) NULL DEFAULT NULL,
`srv_time` BIGINT(20) NOT NULL,
`err_c` BIGINT(20) NULL DEFAULT NULL,
`err_s` BIGINT(20) NULL DEFAULT NULL,
`srv_err_s` BIGINT(20) NULL DEFAULT NULL,
`una_s` BIGINT(20) NULL DEFAULT NULL,
`valid` BIGINT(1) NULL DEFAULT NULL,
`r_err` BIGINT(20) NULL DEFAULT NULL,
`k_err` BIGINT(20) NULL DEFAULT NULL,
`l2CounterType` BIGINT(20) NOT NULL,
`l2Count` BIGINT(20) NULL DEFAULT NULL,
PRIMARY KEY (`id`, `if_i`, `period`, `type`, `servicetype`, `srv_time`, `l2CounterType`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
;

Solution

  • INSERT and OPTIMIZE fight each other; my suggestions will decrease this conflict in several ways, the most important being by getting rid of OPTIMIZE.

    GA for 8.0 is available -- Please update to it; your RC version should not be trusted in production.

    Which column controls "older then 30 days"? In a non-partitioned table, see if it is practical to put it first in the PRIMARY KEY. By "clustering" all the rows being inserted together, I/O is decreased significantly. In a partitioned table (as mentioned below), try to move that column to a later position in the PK. (The Partitioning will decrease the I/O, since you will be inserting only in the 'last' partition.)

    Make sure you are using innodb_file_per_table=ON.

    Don't blindly use 8-byte BIGINT; find a suitably flavor of INT that will be smaller, but with sufficient range.

    Use PARTITION BY RANGE(TO_DAYS(...)) to break the tables into about 32 partition. This will make the dropping of old data very fast and efficient, without the need for OPTIMIZE TABLE. More discussion: http://mysql.rjweb.org/doc.php/partitionmaint

    Note: Partitioned tables have some extra "free space", so my suggestion will still lead to about 300MB per table. However, the free space will not impact performance. If practical, have fewer partitions -- say, 12 3-day partitions.

    How are you doing the inserts? a single LOAD DATA? A 'batch' INSERT? (I hope you are not inserting one row at a time.)

    Are most of the columns really NULLable?

    OPTIMIZE is 'safe' in all respects since it locks the table, copies all the data over, then renames the new copy in place of the old. (Actually there is a tiny window of vulnerability, but 8.0 covers that with its "Data Dictionary".)