I have a text file, the contents if open in notepad shows:
ʸ³ßÓÀ¼ª
If I drag it to chrome browser, it automatically decode and display correctly as
矢尺永吉
After a bit of research, the code in the file is encoded with gb18030. I am attempting to do the conversion in C#. Below is my code:
public static string codeCovert(string s)
{
Encoding gb18 = Encoding.GetEncoding("gb18030");
Encoding Utf8 = Encoding.UTF8;
byte[] gbcode = gb18.GetBytes(s);
return Utf8.GetString(gbcode);
}
And this still gives a whole bunch of wrong characters. Can anyone help please? Thanks.
Your method takes in a string
and returns another string
which does not make sense. System.String
is a "vector" of UTF-16 code units.
You should do:
using System.Text;
using System.IO;
// ...
var str = File.ReadAllText(@"path\file.txt", Encoding.GetEncoding("GB18030"));
While str
is in memory, it has the value "矢尺永吉"
. It cannot be "UTF-8" when it is a .NET string in memory. You can save it to another file, of course:
File.WriteAllText(@"path\otherfile.txt", str, Encoding.UTF8);
Edit: In newer versions of .NET, you need to do:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
before you can use Encoding.GetEncoding("GB18030")
.