python python-3.x performance performance-testing

Why Python3.6 showing better results than 3.7?

I have the following code (Its useless, just for performance testing)

class A:
    def __init__(self, i):
        self.i = i

start = datetime.now()
foo = {}
for i in range(10000000):
    foo[i] = A(i)

print('\nSpent: [ {} ] seconds!'.format((datetime.now()-start).total_seconds()))

The thing is, that when I run it with Python3.7 I get the following results

Spent: [ 7.644764 ] seconds!

But when I run it with Python3.6

Spent: [ 6.521555 ] seconds!

So the question is, do I misunderstand something or the older python is faster and I should use the old one?

UPD: As suggested in the comments, I've used timeit module, here the results

python3.7 -m timeit '"-".join(str(n) for n in range(2000000))'
1 loop, best of 5: 499 msec per loop

python3.6 -m timeit '"-".join(str(n) for n in range(2000000))'
10 loops, best of 3: 405 msec per loop

The results with timeit are still bad for 3.7, is it really slower than 3.6 ?

Solution

Your timing method is flawed. Across 6-7 seconds a modern OS won't give Python exclusive access to the CPU, other things are happening too, as the OS switches between processes, flushes disk buffers for files being written, executes scheduled network events, etc.

You also generate quite a lot of objects that are all loaded into memory, so Python has to ask the OS for additional memory pages to be allocated. It depends on what else your computer was executing at the time how fast that memory can be given. It appears that you ran Python 3.6 second, so it could easily be that the memory freed and re-allocated to the Python 3.7 run is still available for the 3.6 run, and recently released memory is much easier to reallocate for the OS.

Next, you used a rather imprecise wall-clock timer to time your performance. datetime.now() is fine for humans that want to know the current time, it is not fine for measuring performance. There are better, more specialised clocks available to Python for the latter task. Python itself also has a background process called the garbage collector that'll also want to get some time to do its work, affecting how Python performs the tasks you gave it.

Instead, you need to separate out different problems that Python has to solve here into separate tests. Run those separate tests under controlled circumstances, with an accurate clock, and with as many distractions as possible disabled. Run those tests many, many times and then take either an average time (if you only have an aggregate available) or the best time from many repeats.

Python has a library for this, called timeit. Use that to only create the instances, not store them all in a dictionary too. As stated before, memory allocation is subject to the OS's timings, not Python's. Make sure to keep repeating your tests; if -m timeit runs a test just once you really can't trust the timings, reduce the work done in the benchmark.

Next, if your goal is to compare Python 3.6 vs 3.7 on general performance terms and not a specific microbenchmark, then you'll need to a wide range of tests. Stuff changes all the time from 3.x to 3.x+1 releases. Don't base anything on a single string join or instance creation test. And know that the Python developers will already have done all that work. See https://speed.python.org/ for a full suite of benchmarks and timings that the core team uses to monitor performance, or see the PyPerformance suite for another such benchmark.