Search code examples
c#linqgroup-bybatch-processingbucket

How to build batches/buckets with linq


I need to create batches from a lazy enumerable with following requirements:

  • Memory friendly: items must be lazy loaded even within each batch (IEnumerable<IEnumerable<T>>, excludes solution building arrays)
  • the solution must not enumerate twice the input (excludes solutions with Skip() and Take())
  • the solution must not iterate through the entire input if not required (exclude solutions with GroupBy)

The question is similar but more restrictive to followings:


Solution

  • Originally posted by @Nick_Whaley in Create batches in linq, but not the best response as the question was formulated differently:

    Try this:

    public static IEnumerable<IEnumerable<T>> Bucketize<T>(this IEnumerable<T> items, int bucketSize)
    {
        var enumerator = items.GetEnumerator();
        while (enumerator.MoveNext())
            yield return GetNextBucket(enumerator, bucketSize);
    }
    
    private static IEnumerable<T> GetNextBucket<T>(IEnumerator<T> enumerator, int maxItems)
    {
        int count = 0;
        do
        {
            yield return enumerator.Current;
    
            count++;
            if (count == maxItems)
                yield break;
    
        } while (enumerator.MoveNext());
    }
    

    The trick is to pass the old-fashion enumerator between inner and outer enumeration, to enable continuation between two batches.