I am trying to batch the IEnumerable<T>
in equal subsets and came across following solutions:
MoreLinq Nuget library Batch, whose implementation is detailed here:
MoreLinq - Batch, pasting source code underneath:
public static IEnumerable<TResult> Batch<TSource, TResult>(this
IEnumerable<TSource> source, int size,
Func<IEnumerable<TSource>, TResult> resultSelector)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return BatchImpl(source, size, resultSelector);
}
private static IEnumerable<TResult> BatchImpl<TSource, TResult> (this IEnumerable<TSource> source, int
size,Func<IEnumerable<TSource>, TResult> resultSelector)
{
Debug.Assert(source != null);
Debug.Assert(size > 0);
Debug.Assert(resultSelector != null);
TSource[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
{
bucket = new TSource[size];
}
bucket[count++] = item;
// The bucket is fully buffered before it's yielded
if (count != size)
{
continue;
}
// Select is necessary so bucket contents are streamed too
yield return resultSelector(bucket);
bucket = null;
count = 0;
}
// Return the last bucket with all remaining elements
if (bucket != null && count > 0)
{
Array.Resize(ref bucket, count);
yield return resultSelector(bucket);
}
}
Another optimal solution is available on the following link (more memory efficient):
IEnumerable Batching, pasting source code underneath:
public static class BatchLinq
{
public static IEnumerable<IEnumerable<T>> CustomBatch<T>(this IEnumerable<T> source, int size)
{
if (size <= 0)
throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
using (IEnumerator<T> enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return TakeIEnumerator(enumerator, size);
}
private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size)
{
int i = 0;
do
yield return source.Current;
while (++i < size && source.MoveNext());
}
}
Both the solutions provide the end result as IEnumerable<IEnumerable<T>>
.
I find the discrepancy in the following piece of code:
var result = Fetch IEnumerable<IEnumerable<T>>
from either method suggested above
result.Count()
, leads to different result, its correct for MoreLinq Batch, but not correct for other one, even when the Result is correct and same for both
Consider the follwing example:
IEnumerable<int> arr = new int[10] {1,2,3,4,5,6,7,8,9,10};
For a Partition size 3
arr.Batch(3).Count(), will provide result 4 which is correct
arr.BatchLinq(3).Count(), will provide result 10 which is incorrect
Even when the batching result provided is correct, when we do ToList()
, is the error since we are still dealing with the memory stream in the second method and memory is not allocated, but still incorrect result shall not be the case, Any views / suggestions
The reason why second result return Count=10 is because it uses while (enumerator.MoveNext())
which will yield 10 times and causes resulting enumerable to contain 10 enumerables instead of 3.
Answer with higher score https://stackoverflow.com/a/13731854/2138959 in referenced question provided reasonable solution to the problem as well.