Search code examples
c#htmlcharacter-encoding

Converting a C# string to a stream


To convert a string to a stream the code normally is:

byte[] byteArray = System.Text.Encoding.ASCII.GetBytes(inputString);
MemoryStream stream = new MemoryStream(byteArray);

What's key to this is the Encoding. While the strings in C# are Unicode, the strings in a file are the encoding expected for that file. So that means the encoding needs to be selected based on where it's going.

Correct me if I'm wrong, but I think that means for a .txt file use Encoding.ASCII as the file is read by text editors expecting 8-bit characters.

And when writing to a html file, then should it be Encoding.UTF8?

And what should it be when writing a .js or .css file?


Solution

  • ASCII is a 7-bit encoding (the high bit is undefined). (See https://en.wikipedia.org/wiki/ASCII). Many people confuse ASCII with what Windows has traditionally called ANSI (i.e. codepages), which are 8-bit encodings. You'll want to avoid dealing with ASCII and codepages (there's no good win to be had), unless you have a very specific application or limitation in the environment you're working in (e.g. working with mainframe data).

    Since you mention only web-oriented file types, you should use UTF-8 (without BOM) for all.

    Most text editors today support UTF-8 transparently and is the modern standard for plain text.

    Just use UTF-8 without BOM and come back and ask again if you ever run into a situation where you observe an issue, and then we can help with the specifics of that situation or tool/editor/process.