Search code examples
mysqlutf-8configurationmariadbinformation-schema

How can I set MySQL's default collation for utf8 to utf8_unicode_ci?


I'm converting a database to the utf8 character set and utf8_unicode_ci collation. When altering a table's character set to utf8, MySQL automatically converts the columns of the table to the default collation for utf8: utf_general_ci. I don't want to run hundreds of alter column commands to convert every column to utf8_unicode_ci, so can I set the default collation for utf8 to utf8_unicode_ci, as shown in information_schema?:

SELECT * FROM information_schema.COLLATIONS WHERE CHARACTER_SET_NAME = 'utf8';

+---------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME            | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+---------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci           | utf8               |  33 | Yes        | Yes         |       1 |
| utf8_bin                  | utf8               |  83 |            | Yes         |       1 |
| utf8_unicode_ci           | utf8               | 192 |            | Yes         |       8 |
| utf8_icelandic_ci         | utf8               | 193 |            | Yes         |       8 |
| utf8_latvian_ci           | utf8               | 194 |            | Yes         |       8 |
| utf8_romanian_ci          | utf8               | 195 |            | Yes         |       8 |
| utf8_slovenian_ci         | utf8               | 196 |            | Yes         |       8 |
| utf8_polish_ci            | utf8               | 197 |            | Yes         |       8 |
| utf8_estonian_ci          | utf8               | 198 |            | Yes         |       8 |
| utf8_spanish_ci           | utf8               | 199 |            | Yes         |       8 |
| utf8_swedish_ci           | utf8               | 200 |            | Yes         |       8 |
| utf8_turkish_ci           | utf8               | 201 |            | Yes         |       8 |
| utf8_czech_ci             | utf8               | 202 |            | Yes         |       8 |
| utf8_danish_ci            | utf8               | 203 |            | Yes         |       8 |
| utf8_lithuanian_ci        | utf8               | 204 |            | Yes         |       8 |
| utf8_slovak_ci            | utf8               | 205 |            | Yes         |       8 |
| utf8_spanish2_ci          | utf8               | 206 |            | Yes         |       8 |
| utf8_roman_ci             | utf8               | 207 |            | Yes         |       8 |
| utf8_persian_ci           | utf8               | 208 |            | Yes         |       8 |
| utf8_esperanto_ci         | utf8               | 209 |            | Yes         |       8 |
| utf8_hungarian_ci         | utf8               | 210 |            | Yes         |       8 |
| utf8_sinhala_ci           | utf8               | 211 |            | Yes         |       8 |
| utf8_german2_ci           | utf8               | 212 |            | Yes         |       8 |
| utf8_croatian_mysql561_ci | utf8               | 213 |            | Yes         |       8 |
| utf8_unicode_520_ci       | utf8               | 214 |            | Yes         |       8 |
| utf8_vietnamese_ci        | utf8               | 215 |            | Yes         |       8 |
| utf8_general_mysql500_ci  | utf8               | 223 |            | Yes         |       1 |
| utf8_croatian_ci          | utf8               | 576 |            | Yes         |       8 |
| utf8_myanmar_ci           | utf8               | 577 |            | Yes         |       8 |
+---------------------------+--------------------+-----+------------+-------------+---------+

Note the IS_DEFAULT column.

Please also note that I'm not asking how to convert a database, table or column using ALTER!

Additionally adding collation_server = utf8_unicode_ci to my.cnf does not work.


Solution

  • Need one ALTER per table, not per column (Reference):

    ALTER TABLE foo CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    

    You can generate all the alters, then manually copy them to execute them. Something like

    SELECT CONCAT("ALTER TABLE ", table_schema, ".", table_name,
                  " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
           ")
        FROM information_schema.tables
        WHERE table_schema NOT IN ('mysql', 'information_schema',
                                   'performance_schema', 'sys_schema');
    

    But I suggest you CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci so that you can handle all of Chinese, plus Emoji.

    I hope you did CONVERT TO, not just MODIFY COLUMN. The former converts the characters; the latter will make a mess of any 8-bit characters already in the table.

    One gotcha with utf8mb4 happens if you have indexes on VARCHAR(255). If practical, shrink the size to 191 or less.

    Example

    mysql> SHOW CREATE TABLE iidr\G
    *************************** 1. row ***************************
           Table: iidr
    Create Table: CREATE TABLE `iidr` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `key2` int(10) unsigned NOT NULL,
      `vc` varchar(99) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `key2` (`key2`)
    ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    
    mysql> SHOW FULL COLUMNS FROM iidr;
    +-------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | Field | Type             | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment |
    +-------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | id    | int(10) unsigned | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
    | key2  | int(10) unsigned | NULL            | NO   | UNI | NULL    |                | select,insert,update,references |         |
    | vc    | varchar(99)      | utf8_general_ci | YES  |     | NULL    |                | select,insert,update,references |         |
    +-------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    3 rows in set (0.00 sec)
    
    mysql> ALTER TABLE iidr CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
    Query OK, 2 rows affected (0.14 sec)
    Records: 2  Duplicates: 0  Warnings: 0
    
    mysql> SHOW FULL COLUMNS FROM iidr;
    +-------+------------------+------------------------+------+-----+---------+----------------+---------------------------------+---------+
    | Field | Type             | Collation              | Null | Key | Default | Extra          | Privileges                      | Comment |
    +-------+------------------+------------------------+------+-----+---------+----------------+---------------------------------+---------+
    | id    | int(10) unsigned | NULL                   | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
    | key2  | int(10) unsigned | NULL                   | NO   | UNI | NULL    |                | select,insert,update,references |         |
    | vc    | varchar(99)      | utf8mb4_unicode_520_ci | YES  |     | NULL    |                | select,insert,update,references |         |
    +-------+------------------+------------------------+------+-----+---------+----------------+---------------------------------+---------+
    3 rows in set (0.00 sec)
    
    mysql> SHOW CREATE TABLE iidr\G
    *************************** 1. row ***************************
           Table: iidr
    Create Table: CREATE TABLE `iidr` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `key2` int(10) unsigned NOT NULL,
      `vc` varchar(99) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `key2` (`key2`)
    ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci
    1 row in set (0.00 sec)