Why is this IEnumerable extension method much slower than another (more simpler) extension method (that only iterate input)?

I have a console application that contains two methods :

public static IEnumerable<TSource>
          FooA<TSource>(this IEnumerable<IEnumerable<TSource>> source)
{
    return source.Aggregate((x, y) => x.Intersect(y));
}

public static IEnumerable<TSource> 
          FooB<TSource>(this IEnumerable<IEnumerable<TSource>> source)
{
    foreach (TSource element in source.First())
    {
        yield return element;
    }
}

What it does : both take a sequence of sequences, FooA produce the intersection set of all of them, then return result. FooB simply iterate the first sequence.

What I do not understand : FooB is more than 10 times slower than FooA, while FooB is actually a lot more simple (there is no call to Intersect() method).

Here is results :

00:00:00.0071053 (FooA)
00:00:00.0875303 (FooB)

FooB can be a lot faster by returning directly source.First(), anyway I decompiled Distinct method using ILSpy and found exactly same foreach yield return loop :

private static IEnumerable<TSource> DistinctIterator<TSource>
   (IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
    Set<TSource> set = new Set<TSource>(comparer);
    foreach (TSource current in source)
    {
        if (set.Add(current))
        {
            yield return current;
        }
    }
    yield break;
}

Also : in code i'm using i cannot return source.First() (I get CS1622). What i show here is actually a much more simpler code that I stripped down for debugging.

Here is code I use for testing :

List<List<int>> foo = new List<List<int>>();
foo.Add(new List<int>(Enumerable.Range(0, 3000*1000)));

Stopwatch sa = new Stopwatch();
sa.Start();
List<int> la = FooA(foo).ToList();
Console.WriteLine(sa.Elapsed);


Stopwatch sb = new Stopwatch();
sb.Start();
List<int> lb = FooB(foo).ToList();
Console.WriteLine(sb.Elapsed);

Solution

The reason why you are measuring such a big difference is that the Aggregate call simply returns your initial list since there are no items to aggregate because your list has only one item.

If you change it to

    List<List<int>> foo = new List<List<int>>()
    {
        new List<int>(Enumerable.Range(0, 3000 * 1000)),
        new List<int>(Enumerable.Range(0, 3000 * 1000)),
    };

With only one item like you:

A: 00:00:00.0037843
B: 00:00:00.0514177

But with two items:

A: 00:00:00.2130628
B: 00:00:00.0574932

A is much slower now. The difference in the first example was due to the array allocations which did cause much more CPU cycles.

    AllocationAmount AllocationKind
B     1CAE0         Small
B     21E5C         Small
B     20020         Large
B     40020         Large
B     80020         Large
B    100020         Large
B    200020         Large
B    400020         Large
B    800020         Large
B   1000020         Large
A    B71B20         Large

This are the GC AllocationTick ETW events which are emited by the Garbage Collector. In effect you did compare apples to oranges. Your Aggregate call did basically nothing.