First of all I have read the similar questions and they don't give me a coherent explanation. I use BlockingCollection<WebClient> ClientQueue
to provide Webclients. I give them a handling function and start the async scraping :
// Create queue of WebClient instances
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
for (int i = 0; i < 10; i++)
{
ClientQueue.Add(new WebClient());
}
//Triggering Async Calls
foreach (var item in source)
{
var worker = ClientQueue.Take();
worker.DownloadStringCompleted += (sender, e) => HandleJson(sender, e, ClientQueue, item);
worker.DownloadStringAsync(uri);
}
public static void HandleJson(object sender, EventArgs e, BlockingCollection<WebClient> ClientQueue, string item)
{
var res = (DownloadStringCompletedEventArgs) e;
var jsonData = res.Result;
var worker = (WebClient) sender;
var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
// Record the data
while (worker.IsBusy) Thread.Sleep(5); // wait for the webClient to be free
ClientQueue.Add(worker);
}
I get this error message:
WebClient does not support concurrent I/O operations.
Other threads:
Here answer suggest the the issue is to wait until WebClient.IsBusy = false
but I am doing this before puting back the webclient in the queue. I don't understand why the client cannot perform a new request after making itself IsBusy=false
https://stackoverflow.com/a/9765812/7111121
Here it suggests to use recycle webclients to optimize the process https://stackoverflow.com/a/7474959/2132352
Here it suggests to instanciate a new WebClient (easy solution of course but I don't want something hiding the way the objects used works). It also suggest to cancel the operation but this has not helped.
Problem is that each time particular WebClient is taken from queue, you register new event handler for worker.DownloadStringCompleted
event without deregistering previous event handler - so event handlers accrues. As a consequence, HandleJson
is called multiple times after async download completes and thus ClientQueue.Add(worker)
returns the same client to the queue multiple times too. It is then just a matter of time before two concurrent downloads are issued on the same WebClient.
This can be easily fixed by registering event handler just once during WebClient creation, and removing item
parameter from HandleJson
method.
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
for (int i = 0; i < 2; i++)
{
var worker = new WebClient();
worker.DownloadStringCompleted += (sender, e) => HandleJson(sender, e, ClientQueue);
ClientQueue.Add(worker);
}
If parameter item
is required, pass it as a parameter to DownloadStringAsync(uri, item)
and read it from res.UserState
:
foreach (var item in source)
{
var worker = ClientQueue.Take();
worker.DownloadStringAsync(uri, item);
}
public static void HandleJson(object sender, DownloadStringCompletedEventArgs e, BlockingCollection<WebClient> ClientQueue)
{
string item = (string)res.UserState;
...
}