Does async requests with limited number of concurrent requests generally run faster?

I was playing with async python code trying to improve its performance, and noticed that when I set a limit on number of simultaneously executing tasks via Semaphore, the code usually runs faster than if I don't set any limit and just allow it to make as many requests as it likes. I also noticed, that when I limit number of web requests I get connection errors less often. Is this a general case?

Solution

async functions let us run several tasks ~~in parallel~~ at the same time, but they don't spawn any new threads or new processes. They all run in the same, single thread of the Python interpreter. That is to say, the interpreter can only run as fast as a single core of your processor, and the tasks can only run as fast as the interpreter. (This is the meaning of "concurrent", as distinct from "parallel", where multiple physical processors are in use at the same time.)

That's why folks say that async is good for I/O, and nothing else. It doesn't add computation power, it only lets us do a lot of otherwise-blocking stuff like HTTP requests concurrently. It "recycles" that blocking time, which would otherwise just be idling on the CPU waiting for the network to respond, wasting that CPU time.

So by "recycling" the CPU time that would otherwise be wasted, more tasks increases requests/second, but only up to a point. Eventually, if you spawn too many tasks, then the interpreter and the processor it's running on spend more CPU time managing the tasks than actually waiting for some network response. Not to mention that remote servers have their own bottlenecks (nevermind anti-DDOS protection).

So, async doesn't change the fact that you only have so much speed in your single thread. Too many tasks will clog the interpreter, and too many requests to remote servers will cause them to get fussy with you (whether by bottleneck or by anti-DDOS measures). So yes, you do need to limit your maximum concurrency.

In my local tests with trio and httpx, I get around 400-500 requests/second with around 128-256 concurrent tasks. Using more tasks than that reduces my total requests/second while burning more CPU time -- the interpreter does more task-management than requesting, at that point.

(I could probably optimize my local code, save some task-management CPU time for my interpreter, and maybe if I did that I could get 600 or 800 requests/second, but frankly the remote server I talk to can't really handle more requests anyways, so optimizing my local CPU usage would be mostly a waste of time. Better to add new features instead.)