Search code examples
serializationbinary

How does C# know the length of string using Binary Writer?


Please look at the code below. This program simply saves a 33-character-length string "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" with an additional byte value of "33".

using System.Text;

namespace Test
{
    internal class Program
    {
        static void Main(string[] args)
        {
            string filepath = args[0];
            using (var stream = File.Open(filepath, FileMode.Create))
            {
                using (var writer = new BinaryWriter(stream, Encoding.UTF8, false))
                {
                    writer.Write(new string('!', 33));
                    writer.Write((byte)33);
                }
            }

            using (var stream = File.Open(filepath, FileMode.Open))
            {
                using (var reader = new BinaryReader(stream, Encoding.UTF8, false))
                {
                    Console.WriteLine(reader.ReadString());
                    Console.WriteLine(reader.ReadByte());
                }
            }

            Console.ReadKey();
        }
    }
}

And here is the binary representation of it:

Binary Representation of Writing Result

Apparently, the first starting "ox21" is the length of the string - but how on earth does C# know?


Solution

  • More than 1 year after my original question, I finally found an answer - when BinaryWriter writes the string, it first writes the length of the string.

    using static System.Text.Encoding;
    var test = "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!";
    Console.WriteLine(UTF8.GetBytes(test).Length); // We get 33, meaning the string itself is only 33 characters.
    

    In the original question, the first ox21 means the length of the string - it happens to correspond to ASCII letter !.

    Image

    Notice that in the original image, the hex values are a total of 35 bytes long! This is because that includes:

    1. The first byte is the length of the string (x1 byte, value = 33)
    2. The following 33 bytes are the actual string (x3 bytes)
    3. The last byte is an additional Write() output (x1 byte, value = 33)

    Notice what happens when the length of the string is more than 1 byte:

    using static System.Text.Encoding;
    var longString = new string('!', 256);
    UTF8.GetBytes(longString); // Gives an array with length 256
    var memory = new MemoryStream();
    var writer = new BinaryWriter(memory); 
    writer.Write(longString);
    memory.ToArray(); // Gives the serialized bytes, which is 258 in length!
    

    The first three bytes from above (little endian): 128 2 33

    See this for a comprehensive overview: How BinaryWriter.Write() write string