Search code examples
c#.netperformance.net-corebenchmarkdotnet

Why is my MemoryManager respectively my MemoryPool implementation so slow?


I try for learning purpose to implement a MemoryManager and a MemoryPool and try how it competes with the standard implementation. But especially my MemoryManager is damn slow. Can someone point my in the direction what happened here btw whats wrong?

my memorypool:

internal abstract class ByteMemoryPool : MemoryPool<byte>
{
    private const int POOL_USAGE_BORDER_BYTES = 85000;

    public override int MaxBufferSize => Int32.MaxValue;

    public new static ByteMemoryPool.Impl Shared { get; } = new ByteMemoryPool.Impl();

    public override IMemoryOwner<byte> Rent(int minBufferSize = -1)
    {
        return RentCore(minBufferSize);
    }

    protected override void Dispose(bool disposing)
    {

    }

    private Rental RentCore(int minBufferSize)
    {
        return new Rental(minBufferSize);
    }

    public sealed class Impl : ByteMemoryPool
    {
        public new Rental Rent(int minBufferSize) => RentCore(minBufferSize);
    }

    public struct Rental : IMemoryOwner<byte>
    {
        private byte[]? _array;
        private readonly bool _notRented;

        public Rental(int minBufferSize)
        {
            if (minBufferSize < POOL_USAGE_BORDER_BYTES)
            {
                _array = new byte[minBufferSize];
                _notRented = true;
            }
            else
            {
                _array = ArrayPool<byte>.Shared.Rent(minBufferSize);
                _notRented = false;
            }
        }

        public Memory<byte> Memory
        {
            get
            {
                if (_array == null)
                    throw new ObjectDisposedException(nameof(_array));

                return new Memory<byte>(_array);
            }
        }

        public void Dispose()
        {
            if (_array != null && !_notRented)
            {
                ArrayPool<byte>.Shared.Return(_array, true);
                _array = null;
            }
            else
            {
                _array = null;
            }
        }
    }
}

my memorymanager:

 internal sealed class NativeByteMemoryManager : MemoryManager<byte>
{
    private IntPtr _memoryPtr;
    private readonly int _length;

    public unsafe NativeByteMemoryManager(int length)
    {
        _length = length;
        _memoryPtr = Marshal.AllocHGlobal(length);
        Unsafe.InitBlock((void*)_memoryPtr, 0, (uint)_length);
    }

    public override Memory<byte> Memory => CreateMemory(_length);

    public override unsafe Span<byte> GetSpan()
    {
        return new Span<byte>(_memoryPtr.ToPointer(), _length);
    }

    public override unsafe MemoryHandle Pin(int elementIndex = 0)
    {
        void* pointer = (void*) ((byte*) _memoryPtr + elementIndex);
        return new MemoryHandle(pointer, default, this);
    }

    public override void Unpin()
    {
        Marshal.FreeHGlobal(_memoryPtr);
        _memoryPtr = IntPtr.Zero;
    }

    protected override void Dispose(bool disposing)
    {
        if (_memoryPtr != IntPtr.Zero)
        {
            Marshal.FreeHGlobal(_memoryPtr);
            _memoryPtr = IntPtr.Zero;
        }
    }
}

the benchmark:

    public class MemoryManagerBenchmark
{
    [Params(1000, 8000, 64000, 4000000)]
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Design", "CA1051:Do not declare visible instance fields", Justification = "<Pending>")]
    public int ArraySize;

    [Benchmark(Baseline = true)]
    public int MemoryPoolDefault()
    {
        var x = ArrayPool<byte>.Shared.Rent(ArraySize);
        var l = x.Length;
        ArrayPool<byte>.Shared.Return(x, true);
        return l;
    }

    [Benchmark(Baseline = false)]
    public int MemoryPoolByte()
    {
        using var x = ByteMemoryPool.Shared.Rent(ArraySize);
        var l = x.Memory.Length;
        return l;
    }

    [Benchmark(Baseline = false)]
    public int MemoryManager()
    {
        using var x = new NativeByteMemoryManager(ArraySize);
        var l = x.Memory.Length;
        return l;
    }
}

and the results:

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17763.107 (1809/October2018Update/Redstone5)
Intel Core i7-2600 CPU 3.40GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
  Job-MXYBLG : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT

Force=False  IterationCount=15  LaunchCount=2  
WarmupCount=10  

            Method | ArraySize |           Mean |         Error |        StdDev |          Median | Kurtosis | Skewness | Ratio | RatioSD | Rank | Baseline |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
------------------ |---------- |---------------:|--------------:|--------------:|----------------:|---------:|---------:|------:|--------:|-----:|--------- |---------:|---------:|---------:|----------:|
     **MemoryManager** |      **1000** |       **187.3 ns** |     **28.466 ns** |     **41.726 ns** |       **168.24 ns** |    **1.210** |   **0.1370** |  **1.88** |    **0.45** |    **3** |       **No** |   **0.0076** |        **-** |        **-** |      **32 B** |
    **MemoryPoolByte** |      **1000** |       **123.1 ns** |      **6.342 ns** |      **9.492 ns** |       **121.44 ns** |    **1.730** |   **0.4267** |  **1.24** |    **0.14** |    **2** |       **No** |   **0.2447** |        **-** |        **-** |    **1024 B** |
 **MemoryPoolDefault** |      **1000** |       **100.2 ns** |      **5.226 ns** |      **7.821 ns** |        **97.50 ns** |    **2.284** |   **0.6929** |  **1.00** |    **0.00** |    **1** |      **Yes** |        **-** |        **-** |        **-** |         **-** |
                   |           |                |               |               |                 |          |          |       |         |      |          |          |          |          |           |
     **MemoryManager** |      **8000** |       **374.1 ns** |     **25.279 ns** |     **37.054 ns** |       **349.88 ns** |    **1.264** |   **0.2485** |  **1.54** |    **0.22** |    **2** |       **No** |   **0.0076** |        **-** |        **-** |      **32 B** |
    **MemoryPoolByte** |      **8000** |       **842.4 ns** |     **12.637 ns** |     **18.523 ns** |       **839.46 ns** |    **2.485** |   **0.7287** |  **3.46** |    **0.26** |    **3** |       **No** |   **1.9150** |        **-** |        **-** |    **8024 B** |
 **MemoryPoolDefault** |      **8000** |       **245.1 ns** |     **12.542 ns** |     **17.988 ns** |       **236.31 ns** |    **5.935** |   **1.9246** |  **1.00** |    **0.00** |    **1** |      **Yes** |        **-** |        **-** |        **-** |         **-** |
                   |           |                |               |               |                 |          |          |       |         |      |          |          |          |          |           |
     **MemoryManager** |     **64000** |     **2,311.8 ns** |     **87.763 ns** |    **131.359 ns** |     **2,266.83 ns** |    **2.146** |   **0.6641** |  **1.06** |    **0.06** |    **2** |       **No** |   **0.0076** |        **-** |        **-** |      **32 B** |
    **MemoryPoolByte** |     **64000** |     **5,351.5 ns** |     **82.720 ns** |    **118.634 ns** |     **5,298.23 ns** |    **4.749** |   **1.5884** |  **2.46** |    **0.14** |    **3** |       **No** |  **15.1443** |        **-** |        **-** |   **64024 B** |
 **MemoryPoolDefault** |     **64000** |     **2,187.6 ns** |     **83.603 ns** |    **125.133 ns** |     **2,102.50 ns** |    **2.154** |   **0.9189** |  **1.00** |    **0.00** |    **1** |      **Yes** |        **-** |        **-** |        **-** |         **-** |
                   |           |                |               |               |                 |          |          |       |         |      |          |          |          |          |           |
     **MemoryManager** |   **4000000** | **2,188,789.3 ns** | **65,843.021 ns** | **98,550.733 ns** | **2,165,661.52 ns** |    **4.130** |   **1.3955** | **10.78** |    **0.72** |    **2** |       **No** |        **-** |        **-** |        **-** |      **32 B** |
    **MemoryPoolByte** |   **4000000** |   **199,434.5 ns** |  **2,634.057 ns** |  **3,777.686 ns** |   **198,360.50 ns** |    **3.567** |   **0.9854** |  **0.98** |    **0.04** |    **1** |       **No** | **999.7559** | **999.7559** | **999.7559** |         **-** |
 **MemoryPoolDefault** |   **4000000** |   **203,299.8 ns** |  **3,986.979 ns** |  **5,967.523 ns** |   **201,993.74 ns** |    **3.340** |   **0.7295** |  **1.00** |    **0.00** |    **1** |      **Yes** | **999.7559** | **999.7559** | **999.7559** |         **-** |

Solution

  • A benchmarking tool can tell you only how long it takes to run given code. A profiler can tell you why it takes so long. So you need to profiler your code to find out.

    I've described sample performance investigation using BenchmarkDotNet and PerfView on my blog, it could be a good starting point for you: https://adamsitnik.com/Sample-Perf-Investigation/

    If PerfView does not meet your requirements you can try other profilers: https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md