I will have to convert the database encoding from latin-1 to utf-8.
I'm aware of the fact that converting the database is done via the command of
ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
Source and converting an existing table is done via the command of
ALTER TABLE tbl_name
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]
However, the database is already existent and there is sensitive information involved. My question is whether the data I already have will be changed. The purpose of this question is that I have to give an estimate before I do the change.
Every (character string-type) column has its own character set and collation metadata.
If, when the column's data type was specified (i.e. when it was last created or altered), no character set/collation was explicitly given, then the table's default character set and collation would be used for the column.
If, when the table was specified, no default character set/collation was explicitly given, then the database's default character set and collation would be used for the table's default.
The commands that you quote in your question merely alter such default character sets/collations for the database and table respectively. In other words, they will only affect tables and columns that are created thereafter—they will not affect existing columns (or data).
To update existing data, you should first read the Changing the Character Set section of the manual page on ALTER TABLE
:
Changing the Character Set
To change the table default character set and all character columns (
CHAR
,VARCHAR
,TEXT
) to a new character set, use a statement like this:ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
The statement also changes the collation of all character columns. If you specify no
COLLATE
clause to indicate which collation to use, the statement uses default collation for the character set. If this collation is inappropriate for the intended table use (for example, if it would change from a case-sensitive collation to a case-insensitive collation), specify a collation explicitly.For a column that has a data type of
VARCHAR
or one of theTEXT
types,CONVERT TO CHARACTER SET
changes the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, aTEXT
column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For alatin1
TEXT
column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted toutf8
, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length does not fit in aTEXT
column's length bytes, so MySQL converts the data type toMEDIUMTEXT
, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, aVARCHAR
column might be converted toMEDIUMTEXT
.To avoid data type changes of the type just described, do not use
CONVERT TO CHARACTER SET
. Instead, useMODIFY
to change individual columns. For example:ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8; ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;
If you specify
CONVERT TO CHARACTER SET binary
, theCHAR
,VARCHAR
, andTEXT
columns are converted to their corresponding binary string types (BINARY
,VARBINARY
,BLOB
). This means that the columns no longer will have a character set attribute and a subsequentCONVERT TO
operation will not apply to them.If
charset_name
isDEFAULT
in aCONVERT TO CHARACTER SET
operation, the character set named by thecharacter_set_database
system variable is used.Warning
The
CONVERT TO
operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (likelatin1
) but the stored values actually use some other, incompatible character set (likeutf8
). In this case, you have to do the following for each such column:ALTER TABLE t1 CHANGE c1 c1 BLOB; ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
The reason this works is that there is no conversion when you convert to or from
BLOB
columns.To change only the default character set for a table, use this statement:
ALTER TABLE tbl_name DEFAULT CHARACTER SET charset_name;
The word
DEFAULT
is optional. The default character set is the character set that is used if you do not specify the character set for columns that you add to a table later (for example, withALTER TABLE ... ADD column
).When the
foreign_key_checks
system variable is enabled, which is the default setting, character set conversion is not permitted on tables that include a character string column used in a foreign key constraint. The workaround is to disableforeign_key_checks
before performing the character set conversion. You must perform the conversion on both tables involved in the foreign key constraint before re-enablingforeign_key_checks
. If you re-enableforeign_key_checks
after converting only one of the tables, anON DELETE CASCADE
orON UPDATE CASCADE
operation could corrupt data in the referencing table due to implicit conversion that occurs during these operations (Bug #45290, Bug #74816).