c#multithreading async-await httpwebrequest task-parallel-library

Releasing threads during async tasks

I have a system that spawns a LOT of sub-processes that must run in parallel

The main thread for a request will spawn sub-processes and wait for them to complete.
- Those sub-processes do some processing
- then talk to a remote API
- then do more processing of the results from the API
then the main thread continues when all sub-processes are complete (or timeout hit)

We have trouble with thread counts, so we want to reduce the number of threads active by trying to release the threads whilst we wait for the remote API.

Originally we used WebRequest.GetResponse() for the API call which naturally is holding an idle thread whilst waiting for the API.

The we started using an EAP model (Event-based Async Programming ... all the various .NET methods that use IAsyncResult) , where we call BeginGetResponse(CallbackTrigger), with WaitHandles passed back to the main thread which then triggered the post-API processing.

As we understand it, this means that the sub-process thread terminates, and the Callback is triggered by a network-card-level interupt which triggers a new thread to initiate the callback. i.e. there's no thread sitting waiting to run CallbackTrigger whilst we wait for the API call.

If people could confirm this understanding that would be good?

We are now considering moving to a TPL model (Task Parallel Library ... Task<T>), using WebRequest.GetResponseAsync() which is awaitable. I'm under the impression that this is part of what await\ async does... that await passes control back up the call stack whilst the remote source waits, and that if I initiate a bunch of awaitable Tasks and then call Tasks.WaitAll then that won't be holding onto a thread for each Task whilst that task is awaiting on the remote API.

Have I correctly understood this?

Solution

If people could confirm this understanding that would be good?

Yes. Note that the IAsyncResult/Begin*/End* pattern is APM, not EAP. EAP would be WebClient's approach where the DownloadAsync method triggers a DownloadCompleted event when it's done.

APM/EAP are hard ways of doing asynchronous work, but are in fact asynchronous (meaning, they do not take up a thread just to block on I/O completing). They're "hard" because they makes your code much more complex - to the point that most developers never used them and just stuck with synchronous code instead.

Have I correctly understood this?

Yes. In general, all asynchronous I/O in .NET is implemented using a single I/O completion port which exists as part of the thread pool. This is true whether the API is APM, EAP, or TAP.

The whole idea of async/await with TAP is that the core Tasks (like those returned from GetResponseAsync) are still built on the same asynchronous I/O system, and then async/await makes consuming them much more pleasant; you can stay in the same method with await instead of messing with callbacks (APM) or event handlers (EAP).

As an interesting side note, Task actually implements IAsyncResult, and from a high-level perspective APM and TAP are very similar (both IAsyncResult and Task represent an operation "in flight").

You should find your TAP code significantly simpler (and easier to maintain!) than your current APM/EAP code, with no noticeable change in performance.

(On a side note, consider moving to HttpClient, which was designed from the ground up with TAP in mind, rather than HttpWebRequest/WebClient, which have had TAP bolted-on to them).

However...

I have a system that spawns a LOT of sub-processes that must run in parallel...

With this kind of a "pipeline", you may want to consider converting to TPL Dataflow. Dataflow understands both synchronous and asynchronous (TAP) work, and has built-in support for throttling. A Dataflow approach may simplify your code even further than TAP on its own.