I am having a problem with Parallel.ForEach
. I have written simple application that adds file names to be downloaded to the queue, then using while loop it iterates through the queue, downloads file one at a time, then when file has been downloaded, another async method is called to create object from downloaded memoryStream
. Returned result of this method is not awaited, it is discarded, so the next download starts immediately. Everything works fine if I use simple foreach
in object creation - objects are being created while download is continuing. But if I would like to speed up the object creation process and use Parallel.ForEach
it stops download process until the object is created. UI is fully responsive, but it just won't download the next object. I don't understand why is this happening - Parallel.ForEach
is inside await Task.Run()
and to my limited knowledge about asynchronous programming this should do the trick. Can anyone help me understand why is it blocking first method and how to avoid it?
Here is a small sample:
public async Task DownloadFromCloud(List<string> constructNames)
{
_downloadDataQueue = new Queue<string>();
var _gcsClient = StorageClient.Create();
foreach (var item in constructNames)
{
_downloadDataQueue.Enqueue(item);
}
while (_downloadDataQueue.Count > 0)
{
var memoryStream = new MemoryStream();
await _gcsClient.DownloadObjectAsync("companyprojects",
_downloadDataQueue.Peek(), memoryStream);
memoryStream.Position = 0;
_ = ReadFileXml(memoryStream);
_downloadDataQueue.Dequeue();
}
}
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = new List<Entity>();
foreach (var item in properties)
{
entityList.Add(CreateObjectsFromDownloadedProperties(item));
}
//Parallel.ForEach(properties item =>
//{
// entityList.Add(CreateObjectsFromDownloadedProperties(item));
//});
});
}
EDIT
This is simplified object creation method:
public Entity CreateObjectsFromDownloadedProperties(RebarProperties properties)
{
var path = new LinearPath(properties.Path);
var section = new Region(properties.Region);
var sweep = section.SweepAsMesh(path, 1);
return sweep;
}
Returned result of this method is not awaited, it is discarded, so the next download starts immediately.
This is also dangerous. "Fire and forget" means "I don't care when this operation completes, or if it completes. Just discard all exceptions because I don't care." So fire-and-forget should be extremely rare in practice. It's not appropriate here.
UI is fully responsive, but it just won't download the next object.
I have no idea why it would block the downloads, but there's a definite problem in switching to Parallel.ForEach
: List<T>.Add
is not threadsafe.
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = new List<Entity>();
Parallel.ForEach(properties, item =>
{
var itemToAdd = CreateObjectsFromDownloadedProperties(item);
lock (entityList) { entityList.Add(itemToAdd); }
});
});
}
One tip: if you have a result value, PLINQ
is often cleaner than Parallel
:
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = proeprties
.AsParallel()
.Select(CreateObjectsFromDownloadedProperties)
.ToList();
});
}
However, the code still suffers from the fire-and-forget problem.
For a better fix, I'd recommend taking a step back and using something more suited to "pipeline"-style processing. E.g., TPL Dataflow:
public async Task DownloadFromCloud(List<string> constructNames)
{
// Set up the pipeline.
var gcsClient = StorageClient.Create();
var downloadBlock = new TransformBlock<string, MemoryStream>(async constructName =>
{
var memoryStream = new MemoryStream();
await gcsClient.DownloadObjectAsync("companyprojects", constructName, memoryStream);
memoryStream.Position = 0;
return memoryStream;
});
var processBlock = new TransformBlock<MemoryStream, List<Entity>>(memoryStream =>
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
return proeprties
.AsParallel()
.Select(CreateObjectsFromDownloadedProperties)
.ToList();
});
var resultsBlock = new ActionBlock<List<Entity>>(entities => { /* TODO */ });
downloadBlock.LinkTo(processBlock, new DataflowLinkOptions { PropagateCompletion = true });
processBlock.LinkTo(resultsBlock, new DataflowLinkOptions { PropagateCompletion = true });
// Push data into the pipeline.
foreach (var constructName in constructNames)
await downloadBlock.SendAsync(constructName);
downlodBlock.Complete();
// Wait for pipeline to complete.
await resultsBlock.Completion;
}