In the crawler i am working on. It makes requests using pycurl multi.
What kind of efficiency improvement can i expect if i switch to aiohttp?
Skepticism has me doubting the potential improvement since python has the GIL. Most of the time is spent waiting for the requests(network IO), so if i could do them in a true parallel way and then process them as they come in i could get a good speedup.
Has anyone been through this and can offer some insights?
Thanks
The global interpreter lock is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.
This means that affects the performance of your multithreaded code. AsyncIO is more about handling concurrent requests rather than parallel. With AsyncIO your code will be able to handle more request even with a single threaded loop because the network IO is going to be async. This means that during the time a coroutine fetches a network resource it will "pause" and not lock the thread it's running on and allow other coroutines to execute. The main idea with asyncIO is that even with a single thread you can have your CPU performing calculation constantly instead of waiting for network IO.
If you want to understand more about asyncIO, you need to understand the difference between concurrency and parallelism. This is an excellent Go talk about this subject, but the principals are the same.
So even if python has GIL, performance with asyncIO will be by far better than using traditional threads. Here are some benchmarks: