The Problem Description:
I have this custom "checksum" function:
NORMALIZER = 0x10000
def get_checksum(part1, part2, salt="trailing"):
"""Returns a checksum of two strings."""
combined_string = part1 + part2 + " " + salt if part2 != "***" else part1
ords = [ord(x) for x in combined_string]
checksum = ords[0] # initial value
# TODO: document the logic behind the checksum calculations
iterator = zip(ords[1:], ords)
checksum += sum(x + 2 * y if counter % 2 else x * y
for counter, (x, y) in enumerate(iterator))
checksum %= NORMALIZER
return checksum
Which I want to test on both Python3.6 and PyPy performance-wise. I'd like to see if the function would perform better on PyPy, but I'm not completely sure, what is the most reliable and clean way to do it.
What I've tried and the Question:
Currently, I'm using timeit
for both:
$ python3.6 -mtimeit -s "from test import get_checksum" "get_checksum('test1' * 100000, 'test2' * 100000)"
10 loops, best of 3: 329 msec per loop
$ pypy -mtimeit -s "from test import get_checksum" "get_checksum('test1' * 100000, 'test2' * 100000)"
10 loops, best of 3: 104 msec per loop
My concern is I'm not absolutely sure if timeit
is the right tool for the job on PyPy
because of the potential JIT warmup overhead.
Plus, the PyPy itself reports the following before reporting the test results:
WARNING: timeit is a very unreliable tool. use perf or something else for real measurements
pypy -m pip install perf
pypy -m perf timeit -s 'from test import get_checksum' "get_checksum('test1' * 1000000, 'test2' * 1000000)"
What would be the best and most accurate approach to test the same exact function performance across these and potentially other Python implementations?
You could increase the number of repetitions with the --repeat
parameter in order to improve timing accuracy. see: