Search code examples
mysqlruby-on-railsutf-8character-encodingruby-on-rails-2

How I can know what encoding is used for a given hex?


I'm upgrading an application from rails 2.3 to rails 5. One problem that we have is with encodings on db, we are using mysql.

On the rails 2.3 application, if you query the db for our field you get the valid symbol, for example:

If you look directly on the db:

€

Checking the hex representation

select HEX(txt) from table;
+----------------+
| HEX(txt)       |
+----------------+
| C3A2E2809AC2AC |
+----------------+
1 row in set (0.00 sec)

If I save exactly the same char on the rails 5 version o the app, i got the correct value on the db when query the db directly.

For the lengh of the hex I thought it was utf-16 but not:

SELECT CHAR(0xC3A2E2809AC2AC USING utf16);
+-----------------------------------+
| CHAR(0xC3A2E2809AC2AC USING utf16) |
+-----------------------------------+
| 肚슬                              |
+-----------------------------------+
1 row in set (0.00 sec)

Now, if I know that 0xC3A2E2809AC2AC represent a €, its possible to know in what charset is that representation accurate?

I think that the mysql adapter mysql (2.8.1) is doing some conversion, but I'm not able to find any documentation about this.

the field collation is utf8_general_ci and the db character set is utf8.


Solution

  • To convert that into utf 8, export and import the table, like this

    mysqldump -u db_user -p --opt --default-character-set=latin1 --skip-set-charset db_name db_table > some_file.sql
    

    observe --skip-set-charset option to force it not to put any charset in dump.

    then I import it with

    mysql -u db_user -p --default-character-set=utf8 db_name < some_file.sql