Search code examples
c#-4.0encode

How to convert saved text file encoding to UTF8?


recently i saved a text file on my computer but when i open it again i saw some strings like:

 "˜ÌÇí ÍÑÝã ÚÌíÈå¿"

now i want to know is it possible to reconvert it to the original text (UTF8)?

i try this codes but it doesn't works

  string tempStr="˜ÌÇí ÍÑÝã ÚÌíÈå¿"; 
  Encoding ANSI = Encoding.GetEncoding(1256);
  byte[] ansiBytes = ANSI.GetBytes(tempStr);
  byte[] utf8Bytes = Encoding.Convert(ANSI, Encoding.UTF8, ansiBytes);
  String utf8String = Encoding.UTF8.GetString(utf8Bytes);

Solution

  • You can use something like:

    string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr))
    

    The string wasn't really decoded... Its bytes where simply "enlarged" to char, with something like:

    byte[] bytes = ...
    char[] chars = new char[bytes.Length];
    for (int i = 0; i < bytes.Length; i++)
    {
        chars[i] = bytes[i];
    }
    string str = new string(chars);
    

    Now... This transformation is the same that is done by the codepage ISO-8859-1. So I could simply have done the reverse, or I could have used that codepage to do it for me, I selected the second one.

    Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr)
    

    this gave me the original byte[]

    Then I've done some tests and it seems that the text in the beginning wasn't UTF8, it was in codepage 1256, that is an arabic codepage. So I

    string str = Encoding.GetEncoding(1256).GetString(...);
    

    The only thing, the ˜ doesn't seem to be part of the original string.

    There is another possibility:

    string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding(1252).GetBytes(tempStr));
    

    The codepage 1252 is the codepage used in the USA and in a big part of Europe. If you have a Windows configured to English, there is a good chance it uses the 1252 as the default codepage. The result is slightly different than using the iso-8859-1