Search code examples
mysqlutf-8character-encodingchinese-localemojibake

Identify encoding and convert characters


Every once in a while, a customer from China will place an order on my site, and sometimes their name and address information will be written to the MySQL database in a series of characters that I can't identify or translate.

For example, the ship-to city of a recent customer appears to me like this:

·s¥_¥«ªo¾ô°Ï

I can say for certain that the database itself, and the fields that hold the customer information, are set to utf8_general_ci collation. The website itself used to have in its header the following content type declaration:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" /> 

but this has been commented out in recent weeks, I believe in an attempt to discover why some Chinese order information was stored in characters like that. Before it was commented out, the same information above would appear like this:

�s�_���o����

Is there an online utility I can use to translate blocks of text in either of those formats into something readable that DHL or another shipping service can use? And how can I reliably prevent information from being stored that way in the future?


Solution

  • Here's one such online service: http://www.mandarintools.com/email.html

    And this is how it fixed your mojibake: 新北市油橋區

    As for MySQL:

    utf8_general_ci collation

    Collation means "how the strings are compared". If the encoding is not set correctly, it is meaningless. You need to use UTF-8 everywhere: for database's encoding and for database connection. Of course it's possible that the data in the database are written in GB-something or EUC-something, I can't guess it based on the information you provided.