Search code examples
c#multithreadingunmanaged-memoryunmanagedresources

Dispose inside a Parallel for is slower than a regular for loop. Why?


I have simplified my original issue into this test.

Using this class:

public class Unmanaged : IDisposable
{
    private IntPtr unmanagedResource;

    public Unmanaged()
    {
        this.unmanagedResource = Marshal.AllocHGlobal(10 * 1024 * 1024);
    }
    public void DoSomethingWithThisClass()
    {
        Console.WriteLine($"{DateTime.Now} - {this.unmanagedResource.ToInt64()}");
    }

    private bool disposedValue = false; // To detect redundant calls

    protected virtual void Dispose(bool disposing)
    {
        if (!disposedValue)
        {
            Marshal.FreeHGlobal(unmanagedResource);
            disposedValue = true;
        }
    }

    ~Unmanaged() {
       Dispose(false);
     }

    void IDisposable.Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
}

I have these two tests:

public class UnitTest1
{
    const int Runs = 100000;

    [TestMethod]
    public void UsingFor()
    {
        for (var i = 0; i <= Runs; i++)
        {
            using (var unman = new Unmanaged())
            {
                unman.DoSomethingWithThisClass();
            }
        }
    }

    [TestMethod]
    public void UsingParallelFor()
    {
        Parallel.For(0, Runs, new ParallelOptions() { MaxDegreeOfParallelism = 10},
            index => {
                using (var unman = new Unmanaged())
                {
                    unman.DoSomethingWithThisClass();
                }
            });
    }
}

ParallelFor generally takes about twice as long as the regular for. According to the profiler, 62%-65% of the execution time is spent inside FreeHGlobal for the ParallelFor. Only 52%-53% is being spent inside FreeHGlobal for the regular for.

I assumed with modern RAM systems this would not make too much of a difference. Is there any way to handle large chunks of un-managed memory in multiple processes? Is there a way I can change this to have it multi threaded?

If I do not Dispose of the RAM used in each process (bad idea, but just to test), Parallel For is twice as fast, but then I can only open about 4-5 of these (they are large amounts of image data) at the same time before the app crashes (with, as you guessed, an out of RAM exception).

Why does more than one Dispose action on separate objects slow things down?

I can leave them single threaded if that is the only option, but I was hoping to speed this up.

Thank you.


Solution

  • FreeHGlobal almost certainly blocks. That means only one thread in your process can run it at a time. They get in line and wait. There is overhead for that, so it's slower.

    You can make it faster by creating a single large block of unmanaged memory and running a lock-free allocator in it.