Search code examples
c#decodingutf-32

Decoding "strange" utf32 format


I have a file containing UTF32 that was read from a database. I would expect "Hi" to become H\0\0\0i\0\0\0, however it actualy is \0\0\0H\0\0\0i, with the null chars in front.

Does anyone know how this could happen, and how i can decode this leaving all data intact?


Solution

  • You appear to be getting utf-32 in network byte order rather than the reverse order you are expecting. Either order is valid for utf-32.

    What byte order the database uses when you ask for utf-32 will probably be controlled by a that db's configuration.

    You can use IPAddress.NetworkToHostOrder to convert a single code point, or UTF32Encoding with appropriate byte order to convert strings:

            var bytes = new byte[] {0,0,0,(byte)'H',0,0,0,(byte)'i'};
            var encoding = new UTF32Encoding(true, false);
            var text = encoding.GetString(bytes);
    
            Console.WriteLine(text);