I have an application converted from Python 2 (where strings are essentially lists of bytes) and I'm using a string as a convenient byte buffer.
I am rewriting some of this code in the Boo language (Python-like syntax, runs on .NET) and am finding that the strings have an intrinsic encoding type, such as ASCII, UTF-8, etc. Most of the information dealing with bytes refer to arrays of bytes, which are (apparently) fixed length, making them quite awkward to work with.
I can obviously get bytes from a string, but at the risk of expanding some characters into multiple bytes, or discarding/altering bytes above 127, etc. This is fine and I fully understand the reasons for this - but what would be handy for me is either (a) an encoding that guarantees no conversion or discarding of characters so that I can use a string as a convenient byte buffer, or (b) some sort of ByteString class that gives the convenience of the string class. (Ideally the latter as it seems less of a hack.) Do either of these already exist? (Or are trivial to implement?)
I am aware of System.IO.MemoryStream, but the prospect of creating one of those each time and then having to make a System.IO.StreamReader at the end just to get access to ReadToEnd() doesn't seem very efficient, and this is in performance-sensitive code.
(I hope nobody minds that I tagged this as C# as I felt the answers would likely apply there also, and that C# users might have a good idea of the possible solutions.)
EDIT: I've also just discovered System.Text.StringBuilder - again, is there such a thing for bytes?
Use the Latin-1 encoding as described in this answer. It maps values in the range 128-255 unchanged, useful when you want to roundtrip bytes to chars.
UPDATE
Or if you want to manipulate bytes directly, use List<byte>
:
List<byte> result = ...
...
// Add a byte at the end
result.Add(b);
// Add a collection of bytes at the end
byte[] bytesToAppend = ...
result.AddRange(bytesToAppend);
// Insert a collection of bytes at any position
byte[] bytesToInsert = ...
int insertIndex = ...
result.InsertRange(insertIndex, bytesToInsert);
// Remove a range of bytes
result.RemoveRange(index, count);
... etc ...
I've also just discovered System.Text.StringBuilder - again, is there such a thing for bytes?
The StringBuilder
class is needed because regular strings are immutable, and a List<byte>
gives you everything you might expect from a "StringBuilder for bytes".