Search code examples
c#.netlinq

Does ToLookup forces immediate execution of a sequence


I was looking into Enumerable.ToLookup API which converts an enumerable sequence into a dictionary type data structure. More details can be found here:

https://msdn.microsoft.com/en-us/library/system.linq.enumerable.tolookup(v=vs.110).aspx

The only difference it carries from ToDictionary API is the fact that it won't give any error if the key selector results in duplicate keys. I need a comparison of deferred execution semantics of these two APIs. As far as I know, ToDictionary API results in immediate execution of the sequence i.e. it doesn't follow deferred execution semantics of LINQ queries. Can anyone help me with the deferred execution behavior of ToLookup API? Is it the same as ToDictionary API or there is some difference?


Solution

  • Easy enough to test...

    void Main()
    {
        var lookup = Inf().ToLookup(i => i / 100);
        Console.WriteLine("if you see this, ToLookup is deferred"); //never happens
    }
    
    IEnumerable<int> Inf()
    {
        unchecked
        {
            for(var i=0;;i++)
            {
                yield return i;
            }
        }
    }
    

    To recap, ToLookup greedily consumes the source sequence without deferring.

    In contrast, the GroupBy operator is deferred, so you can write the following to no ill-effect:

    var groups = Inf().GroupBy(i => i / 100); //oops
    

    However, GroupBy is greedy, so when you enumerate, the entire source sequence is consumed.

    This means that

    groups.SelectMany(g=>g).First();
    

    also fails to complete.

    When you think about the problem of grouping, it quickly becomes apparent that when separating a sequence into a sequence of groups, it would be impossible to know if even just one of the groups were complete without completely consuming the entire sequence.