Search code examples
c#.netlistlazy-evaluationenumerable

Regarding evaluation of Enumerable/List


I've been playing with Lists and Enumerables and I think I understand the basics:

  • Enumerable: The elements are evaluated each time they are consumed.
  • List: The elements are evaluated on definition and are not reevaluated at any point.

I've done some tests:

Starting with the Enumerable example:

var myList = new List<int>() { 1, 2, 3, 4, 5, 6 };
var myEnumerable = myList.Where(p =>
    {
        Console.Write($"{p} ");
        return p > 2;
    }
);

Console.WriteLine("");
Console.WriteLine("Starting");
myEnumerable.First();
Console.WriteLine("");
myEnumerable.Skip(1).First();

The output is:

Starting
1 2 3 
1 2 3 4 

If we add .ToList() after the .Where(...) then the output is:

1 2 3 4 5 6 
Starting

I also was able to have a bit of both worlds with this class:

class SingleEvaluationEnum<T>
{
    private IEnumerable<T> Enumerable;

    public SingleEvaluationEnum(IEnumerable<T> enumerable)
        => Enumerable = enumerable;

    public IEnumerable<T> Get()
    {
        if (!(Enumerable is List<T>))
            Enumerable = Enumerable.ToList().AsEnumerable();

        return Enumerable;
    }
}

You can see the output is:

Starting
1 2 3 4 5 6 

This way the evaluation is deferred until the first consumption and is not re-evaluated in the next ones. But the whole list is evaluated.

 

My question is: Is there a way to get this output?

Starting
1 2 3
4

In other words: I want myEnumerable.First() to evaluate only the necesary elements, but no more. And I want myEnumerable.Skip(1).First() to reuse the already evaluated elements.

EDIT: Clarification: I want that any "query" over the Enumerable applies to all the elements in the list. That's why (AFAIK) an Enumerator doesn't work.

Thanks!


Solution

  • LINQ is fundamentally a functional approach to working with collections. One of the assumptions is that there are no side-effects to evaluating the functions. You're violating that assumption by calling Console.Write in the function.

    There's no magic involved, just functions. IEnumerable has just one method - GetEnumerator. That's all that is needed for LINQ, and that's all that LINQ really does. For example, a naïve implementation of Where would look like this:

    public static IEnumerable<T> Where<T>(this IEnumerable<T> @this, Func<T, bool> filter)
    {
      foreach (var item in @this)
      {
        if (filter(item)) yield return item;
      }
    }
    

    A Skip might look like this:

    public static IEnumerable<T> Skip<T>(this IEnumerable<T> @this, int skip)
    {
      foreach (var item in @this)
      {
        if (skip-- > 0) continue;
    
        yield return item;
      }
    }
    

    That's all there is to it. It doesn't have any information about what IEnumerable is or represents. In fact, that's the whole point - you're abstracting those details away. There are a few optimizations in those methods, but they don't do anything smart. In the end, the difference between the List and IEnumerable in your example isn't anything fundamental - it's that myEnumerable.Skip(1) has side-effects (because myEnumerable itself has side-effects) while myList.Skip(1) doesn't. But both do the exact same thing - evaluate the enumerable, item by item. There's no other method than GetEnumerator on an enumerable, and IEnumerator only has Current and MoveNext (of those that matter for us).

    LINQ is immutable. That's one of the reasons why it's so useful. This allows you to do exactly what you're doing - query the same enumerable twice but getting the exact same result. But you're not happy with that. You want things to be mutable. Well, nothing is stopping you from making your own helper functions. LINQ is just a bunch of functions, after all - you can make your own.

    One such simple extension could be a memoized enumerable. Wrap around the source enumerable, create a list internally, and when you iterate over the source enumerable, keep adding items to the list. The next time GetEnumerator is called, start iterating over your internal list. When you reach the end, continue with the original approach - iterate over the source enumerable and keep adding to the list.

    This will allow you to use LINQ fully, just inserting Memoize() to your LINQ queries at the places where you want to avoid iterating over the source multiple times. In your example, this would be something like:

    myEnumerable = myEnumerable.Memoize();
    
    Console.WriteLine("");
    Console.WriteLine("Starting");
    myEnumerable.First();
    Console.WriteLine("");
    myEnumerable.Skip(1).First();
    

    The first call to myEnumerable.First() will iterate through the first three items in myList, and the second will only work with the fourth.