Search code examples
c#unicodeencodingcharacter-encodingansi

how to write with a single byte character encoding?


I have a webservice that returns the config file to a low level hardware device. The manufacturer of this device tells me he only supports single byte charactersets for this config file.

On this wiki page I found out that the following should be single byte character sets:

  • ISO 8859
  • ISO/IEC 646 (I could not find this one here)
  • various Microsoft/IBM code pages

But when I call Encoding.GetMaxByteCount(1) on these character sets it always returns 2.

I also tried various other encodings (for instance IBM437), but GetMaxByteCount also returns 2 for other character sets.

The method Endoding.IsSingleByte seems unreliable according to this

You should be careful in what your application does with the value for IsSingleByte. An assumption of how an Encoding will proceed may still be wrong. For example, Windows-1252 has a value of true for Encoding.IsSingleByte, but Encoding.GetMaxByteCount(1) returns 2. This is because the method considers potential leftover surrogates from a previous decoder operation.

Also the method Encoding.GetMaxByteCount has some of the same issues according to this

Note that GetMaxByteCount considers potential leftover surrogates from a previous decoder operation. Because of the decoder, passing a value of 1 to the method retrieves 2 for a single-byte encoding, such as ASCII. Your application should use the IsSingleByte property if this information is necessary.

Because of this I am not sure anymore on what to use.

Further reading.


Solution

  • Basically, GetMaxByteCount considers an edge-case that you will probably never need in regular code, specifically what it says about the decoder and surrogates. The point here is that some code-points are encoded as surrogate pairs, which in unfortunate cases can mean that it straddles two calls to GetBytes() / GetChars (on the encoder/decoder). As a consequence, the implementation may theoretically have a single byte/character still buffered and waiting to be processed, therefore GetMaxByteCount needs to warn about this.

    However! All of this only makes sense if you are using the encoder/decoder directly. If you are using operations on the Encoding, such as Encoding.GetBytes, then all of this is abstracted away from you and you will never need to know. In which case, just use IsSingleByte and you'll be fine.