Search code examples
c#asynchronousiteratorgarbage-collectioniasyncenumerable

Does enumerating an auto-generated IAsyncEnumerable release its current item before awaiting the next?


Simple example:

public abstract class Example {
    protected abstract Task<ReallyLargeItem> GetItem();
    
    public async IAsyncEnumerable<ReallyLargeItem> Producer() {
        while(true) {
            yield return await GetItem();
        }
    }

    public async Task Consumer()
    {
      await foreach(var item in Producer())
        Console.WriteLine(item == null ? "hello" : "world); // Just print something silly to signify the JIT cannot optimise this away here
    }
}

Let's say

  • Producer publishes a really really big item rather immediately after Consumr starts enumerating.
  • Then, nothing happens for the next hour, Producer does not yield any more items during that time.

Will the item become free-able (by the GC) pretty much immediately once Consumer has printed its first line and hits the second awaitable iteration of the loop?

Or would its generated enumerator hold on to a reference of the first item produced until a second item is actually yielded by Producer - meaning, for at least an hour?

I'm aware SharpLab has the answer in theory, but looking at the generated state machines, I'm unfortunately not very hopefuly I can decipher it..


Solution

  • Will the item become free-able (by the GC) pretty much immediately once Consumer has printed its first line and hits the second awaitable iteration of the loop?

    No, it won't become eligible for garbage collection. Iterators in C# are implementing the IEnumerable<T>/IAsyncEnumerable<T> interfaces, and these interfaces require two method calls in each iteration. First you must call MoveNext/MoveNextAsync to learn if the sequence has any more elements, and then you must call the get accessor of the Current property in order to get the next available item. Getting the Current is not a mutating operation. The Current is not nullified after you get it. You are allowed to get the Current multiple times if you want, and every time you'll get the same element. From the docs:

    Current returns the same object until MoveNext is called. MoveNext sets Current to the next element.

    So when exactly is the Current replaced by the next value of the sequence? It is replaced when the next MoveNextAsync operation completes asynchronously. In other words it is replaced when the ValueTask<bool> completes, not when the MoveNextAsync operation is launched. This can be proved experimentally. Let's observe for example the behavior of the sequence below:

    async IAsyncEnumerable<int> Produce()
    {
        yield return 1;
        await Task.Delay(1000);
        yield return 2;
    }
    

    This sequence produces the value 1 immediately, and the value 2 after a delay of one second. The complete experiment can be found here. It produces this output:

    11:49:47.150 [1] > 1st MoveNextAsync result: True
    11:49:47.170 [1] > Current: 1
    11:49:47.174 [1] > 2nd MoveNextAsync started
    11:49:47.175 [1] > Current: 1
    11:49:47.275 [1] > Current: 1
    11:49:47.375 [1] > Current: 1
    11:49:47.475 [1] > Current: 1
    11:49:47.576 [1] > Current: 1
    11:49:47.676 [1] > Current: 1
    11:49:47.776 [1] > Current: 1
    11:49:47.876 [1] > Current: 1
    11:49:47.977 [1] > Current: 1
    11:49:48.077 [1] > Current: 1
    11:49:48.177 [1] > 2nd MoveNextAsync result: True
    11:49:48.177 [1] > Current: 2
    

    We can see that while the 2nd MoveNextAsync operation is in-flight, the Current is stuck with the value 1. This means that in your case the ReallyLargeItem will not be eligible for garbage collection during the prolonged period of time (possibly hours) that the producer has nothing to produce, and so the pending MoveNextAsync operation stays inert in the non-completed state.

    As a side note you might be surprised to know that the Current is not nullified when you Dispose/DisposeAsync the enumerator. See this question for details. This questionable behavior is unlikely to change, because of backward compatibility concerns.

    In case you are enumerating memory-heavy objects, probably the best you can do is to store your objects in a mutable wrapper, and unlink the object from the wrapper when you have finished working with the heavy object. A handy class to use as a wrapper is the StrongBox<T>.

    public async IAsyncEnumerable<StrongBox<ReallyLargeItem>> Producer()
    {
        while (true)
        {
            yield return new(await GetItem());
        }
    }
    
    public async Task Consumer()
    {
        await foreach (var box in Producer())
        {
            Console.WriteLine(box.Value is null ? "hello" : "world");
            box.Value = default; // Let the ReallyLargeItem be recycled.
        }
    }
    

    The alternative is to abandon the convenience of iterators, and code your enumerables by hand.