Search code examples
mysqlgrailsjdbccharacter-encoding

Does characterEncoding in connection string set how values is stored?


I'm about to change the encoding for a database from latin1 to utf8mb4. Due to privacy restrictions, I don't know what the database to be converted contains. I'm worried that by running below SQL, existing data may be changed.

ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

However, the connection string from the grails application contains useUnicode=true&characterEncoding=UTF-8, does this mean that even though latin1_swedish_ci is used for a column, the actual value that has been saved is UTF-8 encoded?

And since this value is UTF-8 encoded, there is no risk that the data will be affected by the change from latin1 to utf8mb4?

+--------------------------+-------------------+
| Variable_name            | Value             |
+--------------------------+-------------------+
| character_set_client     | utf8              |
| character_set_connection | utf8              |
| character_set_database   | latin1            |
| character_set_filesystem | binary            |
| character_set_results    | utf8              |
| character_set_server     | latin1            |
| character_set_system     | utf8              |
| collation_connection     | utf8_general_ci   |
| collation_database       | latin1_swedish_ci |
| collation_server         | latin1_swedish_ci |
+--------------------------+-------------------+```

Solution

  • That's Ώπα? That's the interpretation in UTF-8 (as the outside world calls it), utf8mb4 (MySQL's equivalent) or utf8 (MySQL's partial implementation of UTF-8).

    It would not work well in latin1.

    The encoding in the client and the encoding of a column in the database need not be the same. However, Greek in the client cannot be crammed into latin1 in the table, hence the error message.

    What ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; does is to change all the text columns in that table to be utf8-encoded and convert from whatever encoding is currently used (presumably latin1). This is fine for Western European characters, all of which exist (with different encodings) in both latin1 and utf8.

    To handle Emoji and some of Chinese, you may as well go for utf8mb4:

    ALTER TABLE table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8_unicode_520_ci;