I have to benchmark JSON serialization time and compare it to thrift and Google's protocol buffer's serialization time. Also it has to be in Python.
I was planning on using the Python profilers. http://docs.python.org/2/library/profile.html
Would the profiler be the best way to find function runtimes? Or would outputting a timestamp before and after the function call be the better option?
Or is there an even better way?
From the profile
docs that you linked to:
Note The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is
timeit
for reasonably accurate results). This particularly applies to benchmarking Python code against C code: the profilers introduce overhead for Python code, but not for C-level functions, and so the C code would seem faster than any Python one.
So, no, you do not want to use profile
to benchmark your code. What you want to use profile
for is to figure out why your code is too slow, after you already know that it is.
And you do not want to output a timestamp before and after the function call, either. There are just way too many things you can get wrong that way if you're not careful (using the wrong timestamp function, letting the GC run a cycle collection in the middle of your test run, including test overhead in the loop timing, etc.), and timeit
takes care of all of that for you.
Something like this is a common way to benchmark things:
for impl in 'mycode', 'googlecode', 'thriftcode':
t = timeit.timeit('serialize(data)',
setup='''from {} import serialize;
with open('data.txt') as f: data=f.read()
'''.format(impl),
number=10000)
print('{}: {}'.format(impl, t)
(I'm assuming here that you can write three modules that wrap the three different serialization tools in the same API, a single serialize
function that takes a string and does something or other with it. Obviously there are different ways to organize things.)