Search code examples
c#encodingutf-8cp866

Converting string from CP866 to UTF8


I have database(MSSQL) and it has a table with translations for Product names. One of the languages is russian.

Example of a database entry ¸ą¤®åą ­Øā«ģ using Universal Cyrillic decoder I managed to find out that it is Прдохранитль as well as that the source encoding is CP866 and I need it to get WIndows-1257 or utf-8.

How to do this in C#?

I tried something like

string line = "¸ą¤®åą ­Øā«ģ";

Encoding cp866 = Encoding.GetEncoding("CP866");
Encoding w1257 = Encoding.GetEncoding("windows-1257");
byte[] cp866Bytes = cp866.GetBytes(line);
byte[] w1257Bytes = Encoding.Convert(cp866, w1257, cp866Bytes);
var lineFinal = w1257.GetString(w1257Bytes);

Could anyone help me?

The result for the given code is ?a?¤Raa -Oa?<g


Solution

  • Leaving aside questions about how such string could end up in the database in first place, you can convert it like this:

    string line = "¸ą¤®åą ­Øā«ģ";
    Encoding w1257 = Encoding.GetEncoding("windows-1257");
    Encoding cp866 = Encoding.GetEncoding("CP866");            
    var lineFinal = cp866.GetString(w1257.GetBytes(line));
    

    Because your original string appears to use 1257 code page, and you need CP866.

    Note that this specific string is a big damaged still, it results in Предохр нитель and correct word is Предохранитель (so we have space instead of а at index 8). However, original string also contains space at this position, so this damage is not result of decoding (probably you just copied it wrong into the question).