python performance-testing trace python-asyncio

How to measure Python's asyncio code performance?

I can't use normal tools and technics to measure the performance of a coroutine because the time it takes at await should not be taken in consideration (or it should just consider the overhead of reading from the awaitable but not the IO latency).

So how do measure the time a coroutine takes ? How do I compare 2 implementations and find the more efficent ? What tools do I use ?

Solution

This answer originally contained two different solutions: the first one was based on monkey-patching and the second one does not work for python 3.7 and onward. This new version hopefully presents a better, more robust approach.

First off, standard timing tools such as time can be used to determine the CPU time of a program, which is usually what we're interested in when testing the performance of an asynchronous application. Those measurements can also be performed in python using the time.process_time() function:

import time

real_time = time.time()
cpu_time = time.process_time()

time.sleep(1.)
sum(range(10**6))

real_time = time.time() - real_time
cpu_time = time.process_time() - cpu_time

print(f"CPU time: {cpu_time:.2f} s, Real time: {real_time:.2f} s")

See below the similar output produced by both methods:

$ /usr/bin/time -f "CPU time: %U s, Real time: %e s" python demo.py
CPU time: 0.02 s, Real time: 1.02 s  # python output
CPU time: 0.03 s, Real time: 1.04 s  # `time` output

In an asyncio application, it might happen that some synchronous part of the program ends up performing a blocking call, effectively preventing the event loop from running other tasks. So we might want to record separately the time the event loop spends waiting from the time taken by other IO tasks.

This can be achieved by subclassing the default selector to perform some timing operation and using a custom event loop policy to set everything up. This code snippet provides such a policy along with a context manager for printing different time metrics.

async def main():
    print("~ Correct IO management ~")
    with print_timing():
        await asyncio.sleep(1)
        sum(range(10**6))
    print()

    print("~ Incorrect IO management ~")
    with print_timing():
        time.sleep(0.2)
        await asyncio.sleep(0.8)
        sum(range(10**6))
    print()

asyncio.set_event_loop_policy(TimedEventLoopPolicy())
asyncio.run(main(), debug=True)

Note the difference between those two runs:

~ Correct IO management ~
CPU time:      0.016 s
Select time:   1.001 s
Other IO time: 0.000 s
Real time:     1.017 s

~ Incorrect IO management ~
CPU time:      0.016 s
Select time:   0.800 s
Other IO time: 0.200 s
Real time:     1.017 s

Also notice that the asyncio debug mode can detect those blocking operations:

Executing <Handle <TaskWakeupMethWrapper object at 0x7fd4835864f8>(<Future finis...events.py:396>) created at ~/miniconda/lib/python3.7/asyncio/futures.py:288> took 0.243 seconds