Search code examples
mysqlutf-8replication

MySQL convert to UTF8 without structure change


I have a rather large database that I am trying to convert from charset and collation latin1/latin1_swedish_ci to utf8mb4/utf8mb4_unicode_ci. I am hoping to setup replication to a slave, run the conversion, and then promote the slave when finished as to avoid downtime.

I noticed that when running the query...

ALTER TABLE `sometable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

...MySQL automatically converts text to mediumtext or mediumtext to longtext, etc.

Is there a way to turn this feature off? It is nice that MySQL has this feature, but the problem is that it breaks replication because the structure of the tables on the slave is different from master.


Solution

  • As documented under ALTER TABLE Syntax:

    For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT column's length bytes, so MySQL will convert the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.

    To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example:

    ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
    ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;