Problem: I saw 2 implementations of a Parallel.Foreach()
downloading urls with WebCLient
in a article. The author suggested that in the first example if we have an array of 100 urls - 100 WebClients will be started and most of them will timeout. So he proposed a second implementation where he used thread local state and he stated that "as many WebClient() objects will be spawned as we need".
Question: How the second example ensures that no timeouts will occur? Or in other words how the second example takes in consideration the local limit of connections? Will the clients be reused or something?
Source:
// First example
Parallel.ForEach(urls,
(url,loopstate,index) =>
{
WebClient webclient = new WebClient();
webclient.DownloadFile(url, filenames[index];
});
// Second example
Parallel.ForEach(urls,
() => new WebClient(),
(url, loopstate, index, webclient) =>
{
webclient.DownloadFile(url, filenames[index]);
},
(webclient) => { });
Note: Spawning WebClients on multiple threads is only for demo purposes. I know that it will be more effective with async operations.
Link that I got the source from(I simplified it a little): When Should I Use Parallel.ForEach? When Should I Use PLINQ? Look at the "Thread-Local state" chapter.
in other words how the second example takes in consideration the local limit of connections? Will the clients be reused or something?
What the second example does is, instead of creating a WebClient
object per iteration, it creates a WebClient
instead per thread. This means that if Parallel.ForEach
is using 4 threads, it will create 4 instances and will reuse those objects between iterations. Thus, being able to re-use the connection created by each client instead of a new instance which in turn will have to wait on all other clients connection to close.
Eventually, all clients are fighting for the same IO resource that's available via the underlying ServicePointManager.DefaultConnectionLimit
. The less connections you have open, the more time you have for each request to finish execution. This can also be resolved by increasing the number of connection limits allowed, which default to 2.
Generally speaking, there's no need to use multiple threads to execute concurrent IO requests. Parallelism doesn't actually help here.