Search code examples
c#.netmemorystream

How to read a binary file quickly in c#? (ReadOnlySpan vs MemoryStream)


I'm trying to parse a binary file as fastest as possible. So this is what I first tried to do:

using (FileStream filestream = path.OpenRead()) {
   using (var d = new GZipStream(filestream, CompressionMode.Decompress)) {
      using (MemoryStream m = new MemoryStream()) {
         d.CopyTo(m);
         m.Position = 0;

         using (BinaryReaderBigEndian b = new BinaryReaderBigEndian(m)) {
            while (b.BaseStream.Position != b.BaseStream.Length) {
               UInt32 value = b.ReadUInt32();
}  }  }  }  }

Where BinaryReaderBigEndian class is implemented as it follows:

public static class BinaryReaderBigEndian {
   public BinaryReaderBigEndian(Stream stream) : base(stream) { }

   public override UInt32 ReadUInt32() {
      var x = base.ReadBytes(4);
      Array.Reverse(x);
      return BitConverter.ToUInt32(x, 0);
}  }

Then, I tried to get a performance improvement using ReadOnlySpan instead of MemoryStream. So, I tried doing:

using (FileStream filestream = path.OpenRead()) {
   using (var d = new GZipStream(filestream, CompressionMode.Decompress)) {
      using (MemoryStream m = new MemoryStream()) {
         d.CopyTo(m);
         int position = 0;
         ReadOnlySpan<byte> stream = new ReadOnlySpan<byte>(m.ToArray());

         while (position != stream.Length) {
            UInt32 value = stream.ReadUInt32(position);
            position += 4;
}  }  }  }

Where BinaryReaderBigEndian class changed in:

public static class BinaryReaderBigEndian {
   public override UInt32 ReadUInt32(this ReadOnlySpan<byte> stream, int start) {
      var data = stream.Slice(start, 4).ToArray();
      Array.Reverse(x);
      return BitConverter.ToUInt32(x, 0);
}  }

But, unfortunately, I didn't notice any improvement. So, where am I doing wrong?


Solution

  • I did some measurement of your code on my computer (Intel Q9400, 8 GiB RAM, SSD disk, Win10 x64 Home, .NET Framework 4/7/2, tested with 15 MB (when unpacked) file) with these results:

    No-Span version: 520 ms
    Span version: 720 ms

    So Span version is actually slower! Why? Because new ReadOnlySpan<byte>(m.ToArray()) performs additional copy of whole file and also ReadUInt32() performs many slicings of the Span (slicing is cheap, but not free). Since you performed more work, you can't expect performance to be any better just because you used Span.

    So can we do better? Yes. It turns out that the slowest part of your code is actually garbage collection caused by repeatedly allocating 4-byte Arrays created by the .ToArray() calls in ReadUInt32() method. You can avoid it by implementing ReadUInt32() yourself. It's pretty easy and also eliminates need for Span slicing. You can also replace new ReadOnlySpan<byte>(m.ToArray()) with new ReadOnlySpan<byte>(m.GetBuffer()).Slice(0, (int)m.Length);, which performs cheap slicing instead of copy of whole file. So now code looks like this:

    public static void Read(FileInfo path)
    {
        using (FileStream filestream = path.OpenRead())
        {
            using (var d = new GZipStream(filestream, CompressionMode.Decompress))
            {
                using (MemoryStream m = new MemoryStream())
                {
                    d.CopyTo(m);
                    int position = 0;
    
                    ReadOnlySpan<byte> stream = new ReadOnlySpan<byte>(m.GetBuffer()).Slice(0, (int)m.Length);
    
                    while (position != stream.Length)
                    {
                        UInt32 value = stream.ReadUInt32(position);
                        position += 4;
                    }
                }
            }
        }
    }
    
    public static class BinaryReaderBigEndian
    {
        public static UInt32 ReadUInt32(this ReadOnlySpan<byte> stream, int start)
        {
            UInt32 res = 0;
            for (int i = 0; i < 4; i++)
                {
                    res = (res << 8) | (((UInt32)stream[start + i]) & 0xff);
            }
            return res;
        }
    }
    

    With these changes I get from 720 ms down to 165 ms (4x faster). Sounds great, doesn't it? But we can do even better. We can completely avoid MemoryStream copy and inline and further optimize ReadUInt32():

    public static void Read(FileInfo path)
    {
        using (FileStream filestream = path.OpenRead())
        {
            using (var d = new GZipStream(filestream, CompressionMode.Decompress))
            {
                var buffer = new byte[64 * 1024];
    
                do
                {
                    int bufferDataLength = FillBuffer(d, buffer);
    
                    if (bufferDataLength % 4 != 0)
                        throw new Exception("Stream length not divisible by 4");
    
                    if (bufferDataLength == 0)
                        break;
    
                    for (int i = 0; i < bufferDataLength; i += 4)
                    {
                        uint value = unchecked(
                            (((uint)buffer[i]) << 24)
                            | (((uint)buffer[i + 1]) << 16)
                            | (((uint)buffer[i + 2]) << 8)
                            | (((uint)buffer[i + 3]) << 0));
                    }
    
                } while (true);
            }
        }
    }
    
    private static int FillBuffer(Stream stream, byte[] buffer)
    {
        int read = 0;
        int totalRead = 0;
        do
        {
            read = stream.Read(buffer, totalRead, buffer.Length - totalRead);
            totalRead += read;
    
        } while (read > 0 && totalRead < buffer.Length);
    
        return totalRead;
    }
    

    And now it takes less than 90 ms (8x faster then the original!). And without Span! Span is great in situations, where it allows perform slicing and avoid array copy, but it won't improve performance just by blindly using it. After all, Span is designed to have performance characteristics on par with Array, but not better (and only on runtimes that have special support for it, such as .NET Core 2.1).