python performance profiling benchmarking cpython

Is it REALLY true that Python code runs faster in a function?

I saw a comment that lead me to the question Why does Python code run faster in a function?.

I got to thinking, and figured I would try it myself using the timeit library, however I got very different results:

(note: 10**8 was changed to 10**7 to make things a little bit speedier to time)

>>> from timeit import repeat
>>> setup = """
def main():
    for i in xrange(10**7):
        pass
"""
>>> stmt = """
for i in xrange(10**7):
    pass
"""
>>> min(repeat('main()', setup, repeat=7, number=10))
1.4399558753975725
>>> min(repeat(stmt, repeat=7, number=10))
1.4410973942722194
>>> 1.4410973942722194 / 1.4399558753975725
1.000792745732109

Did I use timeit correctly?
Why are these results less 0.1% different from each other, while the results from the other question were nearly 250% different?
Does it only make a difference when using ~~CPython~~ compiled versions of Python (like Cython)?
Ultimately: is Python code really faster in a function, or does it just depend on how you time it?

Solution

The flaw in your test is the way timeit compiles the code of your stmt. It's actually compiled within the following template:

template = """
def inner(_it, _timer):
    %(setup)s
    _t0 = _timer()
    for _i in _it:
        %(stmt)s
    _t1 = _timer()
    return _t1 - _t0
"""

Thus stmt is actually running in a function, using the fastlocals array (i.e. STORE_FAST).

Here's a test with your function in the question as f_opt versus the unoptimized compiled stmt executed in the function f_no_opt:

>>> code = compile(stmt, '<string>', 'exec')
>>> f_no_opt = types.FunctionType(code, globals())

>>> t_no_opt = min(timeit.repeat(f_no_opt, repeat=10, number=10))
>>> t_opt = min(timeit.repeat(f_opt, repeat=10, number=10))
>>> t_opt / t_no_opt
0.4931101445632647