Search code examples
sqlmysqlcharacter-encoding

How to handle character encoding correctly in MySQL 8.0?


I'm simply trying to create a new table and import some values into it, but I have an encoding issue that I can't seem to figure out.

Here is the character_set and collation information regarding my environment:

mysql> show variables like 'char%';
+--------------------------+-------------------------------------------------+
| Variable_name            | Value                                           |
+--------------------------+-------------------------------------------------+
| character_set_client     | cp850                                           |
| character_set_connection | cp850                                           |
| character_set_database   | latin1                                          |
| character_set_filesystem | binary                                          |
| character_set_results    | cp850                                           |
| character_set_server     | utf8mb4                                         |
| character_set_system     | utf8mb3                                         |
| character_sets_dir       | c:\wamp64\bin\mysql\mysql8.0.31\share\charsets\ |
+--------------------------+-------------------------------------------------+
8 rows in set, 1 warning (0.00 sec)

mysql> show variables like 'coll%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | cp850_general_ci   |
| collation_database   | latin1_swedish_ci  |
| collation_server     | utf8mb4_0900_ai_ci |
+----------------------+--------------------+
3 rows in set, 1 warning (0.00 sec)

I also have a test.sql which contains:

SET default_storage_engine=INNODB;

DROP TABLE IF EXISTS `chapitres`;

CREATE TABLE chapitres (
  id TINYINT UNSIGNED AUTO_INCREMENT NOT NULL,
  titre VARCHAR(255) NOT NULL,
  PRIMARY KEY (id)
)
DEFAULT CHARACTER SET latin1 COLLATE latin1_general_ci;

INSERT INTO chapitres
  (id, titre)
VALUES
  ('1', 'é');

If I import test.sql file as follows:

mysql> source test.sql
Query OK, 1 row affected (0.00 sec)

I don't have the desired result:

mysql> select * from chapitres;
+----+-------+
| id | titre |
+----+-------+
|  1 | Ú     |
+----+-------+
1 row in set (0.00 sec)

I was expecting an é and I got a Ú. Can you help me understand what I'm missing? Thank you very much for your help!


Solution

  • After browsing a bit further, I noticed that é is encoded as 0082 in cp850. This corresponds to 00E9 in both ISO 8859-1 and UTF-8 and Ù is 00E9 in cp850.

    I managed to solve my issue by encoding test.sql in cp850 or simply by copy/pasting its contents as an SQL statement in DBeaver.

    I'm not sure why the character_set_client and character_set_connection are set to cp850, it could be a Windows default, but I'm not ready to venture down that rabbit hole just yet. :)