Search code examples
bytefilestreamstreamwriter

Visual Studio C++ 2008 Manipulating Bytes?


I'm trying to write strictly binary data to files (no encoding). The problem is, when I hex dump the files, I'm noticing rather weird behavior. Using either one of the below methods to construct a file results in the same behavior. I even used the System::Text::Encoding::Default to test as well for the streams.

StreamWriter^ binWriter = gcnew StreamWriter(gcnew FileStream("test.bin",FileMode::Create));

(Also used this method)
FileStream^ tempBin = gcnew FileStream("test.bin",FileMode::Create);
BinaryWriter^ binWriter = gcnew BinaryWriter(tempBin);


binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);
.
.
binWriter->Write(0x9F);

Writing that sequence of bytes, I noticed the only bytes that weren't converted to 0x3F in the hex dump were 0x81,0x8D,0x90,0x9D, ... and I have no idea why.

I also tried making character arrays, and a similar situation happens. i.e.,

array<wchar_t,1>^ OT_Random_Delta_Limits = {0x00,0x00,0x03,0x79,0x00,0x00,0x04,0x88};
binWriter->Write(OT_Random_Delta_Limits);

0x88 would be written as 0x3F.


Solution

  • If you want to stick to binary files then don't use StreamWriter. Just use a FileStream and Write/WriteByte. StreamWriters (and TextWriters in generally) are expressly designed for text. Whether you want an encoding or not, one will be applied - because when you're calling StreamWriter.Write, that's writing a char, not a byte.

    Don't create arrays of wchar_t values either - again, those are for characters, i.e. text.

    BinaryWriter.Write should have worked for you unless it was promoting the values to char in which case you'd have exactly the same problem.

    By the way, without specifying any encoding, I'd expect you to get non-0x3F values, but instead the bytes representing the UTF-8 encoded values for those characters.

    When you specified Encoding.Default, you'd have seen 0x3F for any Unicode values not in that encoding.

    Anyway, the basic lesson is to stick to Stream when you want to deal with binary data rather than text.

    EDIT: Okay, it would be something like:

    public static void ConvertHex(TextReader input, Stream output)
    {
        while (true)
        {
            int firstNybble = input.Read();
            if (firstNybble == -1)
            {
                return;
            }
            int secondNybble = input.Read();
            if (secondNybble == -1)
            {
                throw new IOException("Reader finished half way through a byte");
            }
            int value = (ParseNybble(firstNybble) << 4) + ParseNybble(secondNybble);
            output.WriteByte((byte) value);
        }
    }
    
    // value would actually be a char, but as we've got an int in the above code,
    // it just makes things a bit easier
    private static int ParseNybble(int value)
    {
        if (value >= '0' && value <= '9') return value - '0';
        if (value >= 'A' && value <= 'F') return value - 'A' + 10;
        if (value >= 'a' && value <= 'f') return value - 'a' + 10;
        throw new ArgumentException("Invalid nybble: " + (char) value);
    }
    

    This is very inefficient in terms of buffering etc, but should get you started.