Search code examples
c#.netstreambuffermemorystream

Stream.Read() is Slow Performance


Converting Stream to Array by using memorystream.Read(arr, 0, length) for 19 Mb file. When Running it in machine1 it takes approx 1.26 Sec, in Machine2 it takes approx 3 sec. Why there is a difference in performance?! Is that related to ram Usage of the machine, CPU?! Do we need to increase RAM?!

using (var pdfContent = new MemoryStream(System.IO.File.ReadAllBytes(path))) 
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    byte[] buffer = new byte[pdfContent.Length];
    pdfContent.Read(buffer, 0, (int)pdfContent.Length);
    stopwatch.Stop();
    Console.WriteLine($"End Time:{stopwatch.Elapsed} ");
}

Solution

  • TL;DR: 1. A result of file operations is highly depends on your machine configuration (type and even a model of hard disk is the most crucial in such kind of tests). 2. You should read file by chunks.

    Let's look a bit closer to that example. I prepared a test text file of 21042116 bytes that is 21Mb, create a new console application and added the benchmark library: BenchmarkDotNet:

    using System;
    using System.Diagnostics;
    using System.IO;
    using BenchmarkDotNet.Attributes;
    using BenchmarkDotNet.Jobs;
    using BenchmarkDotNet.Running;
    
    namespace stream_perf
    {
        [SimpleJob(RuntimeMoniker.NetCoreApp50)]
        [RPlotExporter]
        public class StreamBenchmarks
        {
            [Benchmark]
            public void Stackoverflow()
            {
                using (var pdfContent = new MemoryStream(System.IO.File.ReadAllBytes("payload.txt"))) 
                {
                    byte[] buffer = new byte[pdfContent.Length];
                    pdfContent.Read(buffer, 0, (int)pdfContent.Length);
                }
            }
        }
    
        class Program
        {
            static void Main(string[] args)
            {
                var summary = BenchmarkRunner.Run<StreamBenchmarks>();
            }
        }
    }
    
    

    Using a console a ran two commands:

    dotnet build -c release
    dotnet run -c release
    

    That gave me the following result:

    BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
    Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
    .NET Core SDK=5.0.103
      [Host]        : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
      .NET Core 5.0 : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
    
    Job=.NET Core 5.0  Runtime=.NET Core 5.0
    
    |        Method |     Mean |    Error |   StdDev |
    |-------------- |---------:|---------:|---------:|
    | Stackoverflow | 24.24 ms | 0.378 ms | 0.353 ms |
    

    As you can see on my machine it is a really fast. But is it fast enough? No, it doesn't, because we read that file data twice, first time we read file here: System.IO.File.ReadAllBytes("payload.txt") and second time we read file here: pdfContent.Read(buffer, 0, (int)pdfContent.Length);. So I added the following method to my benchmarks:

    [Benchmark]
    public void ReadChunked()
    {
        int totalBytes = 0;
        int readBytes = 0;
        using (var pdfStream = new System.IO.FileStream("payload.txt", FileMode.Open))
        {
            byte[] buffer = new byte[4096];
            while ((readBytes = pdfStream.Read(buffer)) != 0) {
                // do something with buffer
                totalBytes += readBytes;
            }
        }
    }
    

    In that new method we read file by chunks that gives us some advantages:

    1. We read file once
    2. We do not need to allocate a buffer in RAM equals to file size

    Let's look to the benchmark:

    BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
    Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
    .NET Core SDK=5.0.103
      [Host]        : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
      .NET Core 5.0 : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
    
    Job=.NET Core 5.0  Runtime=.NET Core 5.0
    
    |        Method |     Mean |    Error |   StdDev |
    |-------------- |---------:|---------:|---------:|
    | Stackoverflow | 23.85 ms | 0.149 ms | 0.132 ms |
    |   ReadChunked | 18.68 ms | 0.076 ms | 0.071 ms |
    
    

    New method is faster on 21%