Search code examples
c#encodingarraysutf-7

Encoding.UTF7.GetBytes does not reverse Encoding.UTF7.GetString()


I guess I'm missing something fundamental but I'm really confused by this one and searching has failed to find me anything.

I have the following...

byte[] bytes1;
string string1;
byte[] bytes2;

Then I do the following

bytes1 = { 64, 55, 121, 54, 36, 72, 101, 118, 38, 40, 100, 114, 33, 110, 85, 94, 112, 80, 163, 36, 84, 103, 58, 126 };
string1 = System.Text.Encoding.UTF7.GetString(bytes1);
bytes2 = System.Text.Encoding.UTF7.GetBytes(string1);

Bytes2 ends up as 54 instead of 24 bytes and they are completely different bytes.

Now of course this is pointless code anyway, but I've put it in while diagnosing why the bytes I'm getting from Encoding.UTF7.GetString are not the bytes I'm expecting. I have got down to the fact that this is the reason my code is not giving expected results.

Now I'm confused. I know if I don't use encoding then the result of GetBytes from a string can't be relied on to be a particular set of bytes, but I'm using encoding and still getting this difference.

Can anyone enlighten me to what I'm missing?

EDIT: Conclusion is that it's not UTF7. The original byte array is being written to a varbinary in a database by an application I'm programming in a high level language. I have no control of how the original strings are being encoded to varbinaries in that language. I'm trying to read them and handle them in a small C# add-on to the main app which is where I hit this problem. Other encodings I've tried also don't give the right results.


Solution

  • UTF-7 (7-bit Unicode Transformation Format) is a variable-length character encoding that was proposed for representing Unicode text using a stream of ASCII characters. (C) Wikipedia

    Your byte array contain incorrect sequences for UTF7. For example, number "163" not may encoding by 7 bits.