Search code examples
c#linqstack-overflowinfinite-loop

Why does this method result in an infinite loop?


One of my coworkers came to me with a question about this method that results in an infinite loop. The actual code is a bit too involved to post here, but essentially the problem boils down to this:

private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
    items = items.Select(item => items.First(i => i == item));
    return items;
}

This should (you would think) just be a very inefficient way to create a copy of a list. I called it with:

var foo = GoNuts(new[]{1,2,3,4,5,6});

The result is an infinite loop. Strange.

I think that modifying the parameter is, stylistically a bad thing, so I changed the code slightly:

var foo = items.Select(item => items.First(i => i == item));
return foo;

That worked. That is, the program completed; no exception.

More experiments showed that this works, too:

items = items.Select(item => items.First(i => i == item)).ToList();
return items;

As does a simple

return items.Select(item => .....);

Curious.

It's clear that the problem has to do with reassigning the parameter, but only if evaluation is deferred beyond that statement. If I add the ToList() it works.

I have a general, vague, idea of what's going wrong. It looks like the Select is iterating over its own output. That's a little bit strange in itself, because typically an IEnumerable will throw if the collection it's iterating changes.

What I don't understand, because I'm not intimately familiar with the internals of how this stuff works, is why re-assigning the parameter causes this infinite loop.

Is there somebody with more knowledge of the internals who would be willing to explain why the infinite loop occurs here?


Solution

  • The key to answering this is deferred execution. When you do this

    items = items.Select(item => items.First(i => i == item));
    

    you do not iterate the items array passed into the method. Instead, you assign it a new IEnumerable<int>, which references itself back, and starts iterating only when the caller starts enumerating the results.

    That is why all your other fixes have dealt with the problem: all you needed to do is to stop feeding IEnumerable<int> back to itself:

    • Using var foo breaks self-reference by using a different variable,
    • Using return items.Select... breaks self-reference by not using intermediate variables at all,
    • Using ToList() breaks self-reference by avoiding deferred execution: by the time items is re-assigned, old items has been iterated over, so you end up with a plain in-memory List<int>.

    But if it's feeding on itself, how does it get anything at all?

    That's right, it does not get anything! The moment you try iterating items and ask it for the first item, the deferred sequence asks the sequence fed to it for the first item to process, which means that the sequence is asking itself for the first item to process. At this point, it's turtles all the way down, because in order to return the first item to process the sequence must first get the first item to process from itself.