I have an sql table where a column has utf8_unicode_ci
encoding, but the table itself has latin1_swedish_ci
encoding (as reported under Row Statistics
in Structure
tab of phpMyAdmin
).
The PHP webapp that accesses the database displays Japanese text correctly, but inside phpMyAdmin
everything is mojibake. The webapp (correctly) displays the Japanese text Xで有名な
, but in phpMyAdmin
it is Xã¦ã‚™æœ‰åãª
(hex()
output is 312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA
).
The app that was used to generate the data in the table is now broken, but I need to add a few new records. How can I recreate the mojibake found in the table?
I tried to reproduce the mojibake with python:
def rev_engineer(utf8):
mojibake = utf8.encode('utf8').decode('latin1')
print(mojibake)
rev_engineer('Xで有名な')
# output: Xã¦ãæåãª
# should be: Xã¦ã‚™æœ‰åãª
This is obviously very similar, but not quite there. I then tried looping through every possible encoding listed in python's documentation, and encoding/decoding every possible combination, but that did not come up with a match, either. Any idea what I'm missing?
To be sure my character will be interpreted as UTF8 sequence
test> set names utf8 ;
Query OK, 0 rows affected (0.00 sec)
check that i have 2 bytes for é
test> select hex(binary('é')) ;
+-------------------+
| hex(binary('é')) |
+-------------------+
| C3A9 |
+-------------------+
1 row in set (0.00 sec)
checking i have the same value
test ]> select convert(binary(convert(convert(unhex('312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA') using utf8 ) using latin1 )) using utf8 );
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| convert(binary(convert(convert(unhex('312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA') using utf8 ) using latin1 )) using utf8 ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1. Xで有名な |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
by copy/paste the output string , i can reverse the process
test > select hex(convert (convert(binary('1. Xで有名な ') using latin1 ) using utf8 )) ;
+---------------------------------------------------------------------------------+
| hex(convert (convert(binary('1. Xで有名な') using latin1 ) using utf8 )) |
+---------------------------------------------------------------------------------+
| 312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
If you have a few rows to insert you insert your rows with pphpmyadmin and if it does not work directly via the command mysql .
If you want to use python you can use this module : https://pypi.org/project/mysql-latin1-codec/