I'm trying to write a method that reads each page of a pdf, but since it takes a significant amount of time to read each page through the api and I'm looking at pdfs hundreds of pages long, I want to queue the reading of each page asynchronously and then return the results when they're ready, so multiple pages are being read at once.
I use Task.Run to "queue" the task, and I expect to see the Debug log print the pages out of order, but they only execute in order, so I think they are being run synchronously. Any ideas?
var tasks = new List<Task>();
foreach (Page page in _pdfDoc.GetPages()) {
var task = Task.Run(() => {
//tried adding await Task.Yield() here, doesn't work
Debug.WriteLine("searching page " + page.Number);
if (page.Text.Contains(query)) {
pagesWithQuery.Add(page.Number);
}
howManySearched += 1;
Dispatcher.UIThread.InvokeAsync(() => {
searchProgress.Value = howManySearched;
});
return Task.CompletedTask;
});
tasks.Add(task);
// await task; <== does nothing??
}
// await Task.WhenAll(tasks); <== also nothing
I use
Task.Run
to "queue" the task, and I expect to see the Debug log print the pages out of order, but they only execute in order, so I think they are being run synchronously.
You don't have enough data to support this assumption. You are only logging the starting of each Task
:
Task task = Task.Run(() =>
{
Debug.WriteLine("searching page " + page.Number);
...but you have no idea when the Task
completes. You could get a better idea about the level of concurrency achieved by your code, by doing something like this:
object locker = new();
int concurrencyCounter = 0;
int maxConcurrency = 0;
Task task = Task.Run(() =>
{
int concurrency = Interlocked.Increment(ref concurrencyCounter);
lock (locker) maxConcurrency = Math.Max(maxConcurrency, concurrency);
try
{
Debug.WriteLine("searching page " + page.Number);
// Do work with the PDF page...
} finally { Interlocked.Decrement(ref concurrencyCounter); }
});
//...
await Task.WhenAll(tasks);
Debug.WriteLine($"Maximum concurrency: {maxConcurrency}");
As a side note, there is a ton of idiomaticity in your code. You are not taking advantage of either the Parallel
class or the AsParallel
PLINQ operator, nor the Progress<T>
class, and I suspect that there are also race conditions around the use of the undefined variables howManySearched
and pagesWithQuery
.