python performance iterator generator fizzbuzz

Why is this Fizz Buzz generator significantly faster than this Fizz Buzz Iterator class?

After learning about iterator class methods and generators, I tested the performance characteristics of simple Fizz Buzz solutions utilizing each idiom:

>>> from timeit import timeit
>>> timeit('tuple(fizzbuzz.FizzBuzzIterator(10))', 'import fizzbuzz')
13.281935930252075
>>> timeit('tuple(fizzbuzz.fizz_buzz_generator(10))', 'import fizzbuzz')
7.619534015655518

According to timeit the generator function is about 1¾ times faster than the iterator class.

My question again: Why is this Fizz Buzz generator significantly faster than this Fizz Buzz Iterator class?

Fizz Buzz iterator class

class FizzBuzzIterator:

    def __init__(self, low, high=None):
        if high is None:
            self.high = low
            self.current = 1
        else:
            self.high = high
            self.current = max(low, 1)

    def __iter__(self):
        return self

    def next(self):
        if self.current > self.high:
            raise StopIteration
        else:
            c = self.current
            self.current += 1
            if (c % 5 + c % 3) == 0:
                return 'FizzBuzz'
            elif c % 5 == 0:
                return 'Buzz'
            elif c % 3 == 0:
                return 'Fizz'
            else:
                return str(c)

Fizz Buzz generator function

def fizz_buzz_generator(low, high=None):
    if high is None:
        high = low
        cur = 1
    else:
        cur = max(low, 1)
    while cur <= high:
        c = cur
        cur += 1
        if (c % 5 + c % 3) == 0:
            yield 'FizzBuzz'
        elif c % 5 == 0:
            yield 'Buzz'
        elif c % 3 == 0:
            yield 'Fizz'
        else:
            yield str(c)

Solution

Because apparently, generators are implemented more efficiently than iterators.

Your first solution has the following interesting characteristics:

It uses objects.
For an iteration over n numbers, 3 + n methods are called and 2 + 4·n attributes are accessed, which are both potentially slow operations.
Exceptions are used for control flow.

The second solution does none of these, instead it yields which means that the language runtime performs the heavy lifting. As the runtime is usually implemented in C, this can be far more optimized than the very high-level code of your first solution.

Next, you should consider what you are actually benchmarking. For a good benchmark you should choose different input sizes n and watch how the two solutions compare at different scales.

For very small n, we expect the initalization cost to dominate. This is consistent with your results, as performing a function call is less expensive than creating an object.
For larger n, we expect the features of the algorithm to dominate. As the algorithms are exactly the same, the graphs should have the same shape. But per iteration, the first solution has a much higher cost (four attribute accesses, one method call). The two solutions would then be graphs with slightly different slopes. The exact relation of the per-iteration costs can only be assessed by obtaining a large data set of many timings for many input sizes n, and then fitting a function to that data.