I tried to implement custom Linq Chunk function and found this code example
This function should separate IEnumerable into IEnumerable of concrete size
public static class EnumerableExtentions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
using (var enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
int i = 0;
IEnumerable<T> Batch()
{
do yield return enumerator.Current;
while (++i < size && enumerator.MoveNext());
}
yield return Batch();
}
}
}
}
So, I have a question.Why when I try to execute some Linq operation on the result, they are incorrect? For example:
IEnumerable<int> list = Enumerable.Range(0, 10);
Console.WriteLine(list.Batch(2).Count()); // 10 instead of 5
I have an assumption, that it happens because inner IEnumerable Batch() is only triggered when Count() is called, and something goes wrong there, but I don't know what exactly.
I have an assumption, that it happens because inner IEnumerable Batch() is only triggered when Count() is called
It's the opposite. The inner IEnumerable
is not consumed, when you call Count
. Count
only consumes the outer IEnumerable
, which is this one:
while (enumerator.MoveNext())
{
int i = 0;
IEnumerable<T> Batch()
{
// the below is not executed by Count!
// do yield return enumerator.Current;
// while (++i < size && enumerator.MoveNext());
}
yield return Batch();
}
So what Count
would do is just move the enumerator to the end, and counts how many times it moved it, which is 10.
Compare that to how the author of this likely have intended this to be used:
foreach (var batch in someEnumerable.Batch(2)) {
foreach(var thing in batch) {
// ...
}
}
I'm also consuming the inner IEnumerable
s using an inner loop, hence running the code inside the inner Batch
. This yields the current element, then also moves the source enumerator forward. It yields the current element again before the ++i < size
check fails. The outer loop is going to move forward the enumerator again for the next iteration. And that is how you have created a "batch" of two elements.
Notice that the "enumerator" (which came from someEnumerable
) in the previous paragraph is shared between the inner and outer IEnumerable
s. Consuming either the inner or outer IEnumerable
will move the enumerator, and it is only when you consume both the inner and outer IEnumerable
s in a very specific way, does the sequence of things in the previous paragraph happen, leading to you getting batches.
In your case, you can consume the inner IEnumerable
s by calling ToList
:
Console.WriteLine(list.Batch(2).Select(x => x.ToList()).Count()); // 5
While sharing the enumerator here allows the batches to be lazily consumed, it limits the client code to only consume it in very specific ways. In the .NET 6 implementation of Chunk
, the batches (chunks) are eagerly computed as arrays:
public static IEnumerable<TSource[]> Chunk<TSource>(this IEnumerable<TSource> source, int size)
You can do a similar thing in your Batch
by calling ToArray()
here:
yield return Batch().ToArray();
so that the inner IEnumerable
s are always consumed.