Search code examples
c#.netboo

Encoding-free String class for handling bytes? (Or alternative approach)


I have an application converted from Python 2 (where strings are essentially lists of bytes) and I'm using a string as a convenient byte buffer.

I am rewriting some of this code in the Boo language (Python-like syntax, runs on .NET) and am finding that the strings have an intrinsic encoding type, such as ASCII, UTF-8, etc. Most of the information dealing with bytes refer to arrays of bytes, which are (apparently) fixed length, making them quite awkward to work with.

I can obviously get bytes from a string, but at the risk of expanding some characters into multiple bytes, or discarding/altering bytes above 127, etc. This is fine and I fully understand the reasons for this - but what would be handy for me is either (a) an encoding that guarantees no conversion or discarding of characters so that I can use a string as a convenient byte buffer, or (b) some sort of ByteString class that gives the convenience of the string class. (Ideally the latter as it seems less of a hack.) Do either of these already exist? (Or are trivial to implement?)

I am aware of System.IO.MemoryStream, but the prospect of creating one of those each time and then having to make a System.IO.StreamReader at the end just to get access to ReadToEnd() doesn't seem very efficient, and this is in performance-sensitive code.

(I hope nobody minds that I tagged this as C# as I felt the answers would likely apply there also, and that C# users might have a good idea of the possible solutions.)

EDIT: I've also just discovered System.Text.StringBuilder - again, is there such a thing for bytes?


Solution

  • Use the Latin-1 encoding as described in this answer. It maps values in the range 128-255 unchanged, useful when you want to roundtrip bytes to chars.

    UPDATE

    Or if you want to manipulate bytes directly, use List<byte>:

    List<byte> result = ...
    ...
    // Add a byte at the end
    result.Add(b);
    // Add a collection of bytes at the end
    byte[] bytesToAppend = ...
    result.AddRange(bytesToAppend);
    // Insert a collection of bytes at any position
    byte[] bytesToInsert = ...
    int insertIndex = ...
    result.InsertRange(insertIndex, bytesToInsert);
    // Remove a range of bytes
    result.RemoveRange(index, count);
    ... etc ...
    

    I've also just discovered System.Text.StringBuilder - again, is there such a thing for bytes?

    The StringBuilder class is needed because regular strings are immutable, and a List<byte> gives you everything you might expect from a "StringBuilder for bytes".