Search code examples
c#character-encodingstreamwriterbyte-order-mark

Clone encoding but turn off BOM


Say I have an encoding:

Encoding enc;

When this encoding is passed to me, it is set up so it will emit a BOM. I'm not interested in BOMs. Encodings in my system are handled with headers.

Assuming encodings are immutable... I'd like to create a new encoding that exactly matches the existing encoding, but will no longer emit a BOM.

This is so I can avoid the following mismatch:

var data = "áéíóúñ";
var enc = Encoding.UTF8;
long count1 = (long) enc.GetByteCount(data);
long count2;
using(var ms = new MemoryStream())
using(var sw = new StreamWriter(ms, enc))
{
    sw.Write(data);
    sw.Flush();
    count2 = ms.Length;
}
count1.Dump(); //12
count2.Dump(); //15 , oops... BOM was also written

Solution

  • var enc = UTF8Encoding(false); // UTF-8 without BOM
    

    If you don't know the encoding in advance, then you need a bit of extra logic, e.g.

    switch(enc.CodePage) {
    case 65001:
        enc = UTF8Encoding(false);
        break;
    case 1200:
        enc = UnicodeEncoding(false, false);
        break;
    case 1201:
        enc = UnicodeEncoding(true, false);
        break;
    case 12000:
        enc = UTF32Encoding(false, false);
        break;
    case 12001:
        enc = UTF32Encoding(true, false);
        break;
    default:
        // pass through the original enc unchanged
    }