Search code examples
c#webclientblockingcollection

WebClient does not support concurrent I/O operations - DownloadStringAsync - Scraping


First of all I have read the similar questions and they don't give me a coherent explanation. I use BlockingCollection<WebClient> ClientQueue to provide Webclients. I give them a handling function and start the async scraping :

// Create queue of WebClient instances
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
for (int i = 0; i < 10; i++)
{
   ClientQueue.Add(new WebClient());
}

//Triggering Async Calls
foreach (var item in source)
{
   var worker = ClientQueue.Take();
   worker.DownloadStringCompleted += (sender, e) => HandleJson(sender, e, ClientQueue, item);
   worker.DownloadStringAsync(uri);
}

public static void HandleJson(object sender, EventArgs e, BlockingCollection<WebClient> ClientQueue, string item)
{
   var res = (DownloadStringCompletedEventArgs) e;
   var jsonData = res.Result;
   var worker = (WebClient) sender;
   var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
   // Record the data
   while (worker.IsBusy) Thread.Sleep(5); // wait for the webClient to be free
   ClientQueue.Add(worker);
 }

I get this error message:

WebClient does not support concurrent I/O operations.

Other threads:

  • Here answer suggest the the issue is to wait until WebClient.IsBusy = false but I am doing this before puting back the webclient in the queue. I don't understand why the client cannot perform a new request after making itself IsBusy=false https://stackoverflow.com/a/9765812/7111121

  • Here it suggests to use recycle webclients to optimize the process https://stackoverflow.com/a/7474959/2132352

  • Here it suggests to instanciate a new WebClient (easy solution of course but I don't want something hiding the way the objects used works). It also suggest to cancel the operation but this has not helped.


Solution

  • Problem is that each time particular WebClient is taken from queue, you register new event handler for worker.DownloadStringCompleted event without deregistering previous event handler - so event handlers accrues. As a consequence, HandleJson is called multiple times after async download completes and thus ClientQueue.Add(worker) returns the same client to the queue multiple times too. It is then just a matter of time before two concurrent downloads are issued on the same WebClient.

    This can be easily fixed by registering event handler just once during WebClient creation, and removing item parameter from HandleJson method.

    BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
    for (int i = 0; i < 2; i++)
    {
        var worker = new WebClient();
        worker.DownloadStringCompleted += (sender, e) => HandleJson(sender, e, ClientQueue);
        ClientQueue.Add(worker);
    }
    

    If parameter item is required, pass it as a parameter to DownloadStringAsync(uri, item) and read it from res.UserState:

    foreach (var item in source)
    {
       var worker = ClientQueue.Take();
       worker.DownloadStringAsync(uri, item);
    }
    
    public static void HandleJson(object sender, DownloadStringCompletedEventArgs e, BlockingCollection<WebClient> ClientQueue)
    {
        string item = (string)res.UserState;
        ...
    }