Search code examples
c#phpmysqlunicodemysql.data

Convert MySQL non-unicode characters in C#


I have a PHP application that currently stores data in MySQL tables in non-conventional format(I assume this is because it is using a non-unicode mysql connection).

Example, this is one of the customer names as shown in PHP app UI:

DILORIO’S AUTO BODY SHOP

Notice there is a difference in apostrophe between it and the following.

DILORIO'S AUTO BODY SHOP

The latter one uses a standard latin apostrophe as oppose to unicode(i guess) style.

This name is stored in DB table like so:

DILORIO’S AUTO BODY SHOP

When it is being pulled from DB and displayed in UI it all looks correct, but the problem arised when I started to use MYSQL.Data C# connector to pull the same data.

At first I thought I should be able to just bull the value byte array and then convert it to latin1 (I assumed this is a default for PHP), however none of the existing encodings seemed to get me the result I wanted and this is what I get:

enter image description here

this is a DB collation for the field in mysql and how it looks: enter image description here enter image description here

Ideally I want to get rid of all corrupt data in DB and fix the PHP connection to unicode. But at this point it would be nice to just read whats already in there the same way as PHP is able to.

I also tried it with Encoding convert in all different combinations but no luck here either: enter image description here


Solution

  • The text is encoded with Windows-1252, not Latin1, which is why your attempts to decode it above failed. Once you convert the string to Windows-1252 bytes, then decode that using UTF-8, you should have the correct value:

    // note: on .NET 6.0, add 'System.Text.Encoding.CodePages' and call this line of code:
    // Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
    
    var windows1252 = Encoding.GetEncoding(1252);
    var utf8Bytes = windows1252.GetBytes("DILORIO’S AUTO BODY SHOP");
    var correct = Encoding.UTF8.GetString(utf8Bytes);
    // correct == "DILORIO’S AUTO BODY SHOP"