Search code examples
mysqlperformancepartitioningreorganize

Mysql REORGANIZE algorithm


I Have a question about partition merging on Mysql/MariaDB.

My table seem like this :

PARTITION BY RANGE (TO_DAYS(`mydate`))
(
PARTITION p_first VALUES LESS THAN (TO_DAYS('2019-03-01')) ENGINE=InnoDB,
PARTITION p201903 VALUES LESS THAN (TO_DAYS('2019-04-01')) ENGINE=InnoDB,
PARTITION p201904 VALUES LESS THAN (TO_DAYS('2019-05-01')) ENGINE=InnoDB,
PARTITION p201905 VALUES LESS THAN (TO_DAYS('2019-06-01')) ENGINE=InnoDB,
PARTITION p201906 VALUES LESS THAN (TO_DAYS('2019-07-01')) ENGINE=InnoDB,
PARTITION p201907 VALUES LESS THAN (TO_DAYS('2019-08-01')) ENGINE=InnoDB,
PARTITION p201908 VALUES LESS THAN (TO_DAYS('2019-09-01')) ENGINE=InnoDB,
PARTITION p_future VALUES LESS THAN (MAXVALUE) ENGINE=InnoDB
);

The pfirst partition is really huge (500M rows) and the partitioning is useful to get data from last months.

Monthly we merge the oldest partition p201903 with pfirst and split p_future to create new partition (p201909) using these queries:

ALTER TABLE mytable REORGANIZE PARTITION p_first, p201903  INTO ( PARTITION p_first VALUES LESS THAN (TO_DAYS('2019-04-01'));

and

ALTER TABLE EB_position REORGANIZE PARTITION p_future INTO ( PARTITION p201909 VALUES LESS THAN (TO_DAYS('2019-10-01')) ENGINE=InnoDB, PARTITION p_future VALUES LESS THAN (MAXVALUE) ENGINE=InnoDB );

How mysql technically do that ? Is mysql creating new temp part and merge parts into it ? Is mysql fill part with other part ? So, is it better to invert parts definition ? Like :

ALTER TABLE mytable REORGANIZE PARTITION p201903, p_first INTO ( PARTITION p_first VALUES LESS THAN (TO_DAYS('2019-04-01'));

Thanks for your answers


Solution

  • REORGANIZE copies all the data from the source partition(s) into the destination partition(s). Since it involves p_first and that partition is huge, it takes a long time, and locks the table.

    From another point of view... MySQL "plays it safe". But creating the new partitions through a complete copy of the necessary data, if a crash occurs, the old partitioning is intact. If, instead, it had tried to "merge" the partitions, a crash could leave the table corrupted. Safety (and simplicity) is preferred over speed.

    Instead of continually adding to that huge partition, add to the second partition. Or, think of it as building yearly partitions. More specifically, leave p_first alone and add p201904 to p201903. Next month, add p201905 to p201903. At the end of the year, switch to a 'better' naming convention of p2020 to collect the 12 months of the year 2020. Etc.

    After a decade, you will have about 18 partitions; this is not "bad", and you will see very little difference in normal operations due to 18 instead of 8 partitions.

    ... useful to get data from last months.

    Only if you are scanning the entire month's partition. Let me see the query that benefits from your monthly partitioning. In almost every case, partitioning does not provide performance benefits. It is likely to require changes to the indexes to make a non-partitioned table work as fast, or faster, than a partitioned one.