I have tried to implement an optimistic concurrency 'worker'.
Goal is to read a batch of data from the same database table (single table with no relations) with multiple parallel 'worker'. This did seem to work so far. I get optimistic concurrency exceptions here and there, catch them and retry.
So far so good, and the function to get the data works stable on my local setup. When moving the application to a test environment however, I get a strange timeout exception, which even if caught, will end the async function (breaks the while loop). Does someone see a flaw in the implementation? What could cause the timeout? What could cause the end of the async function?
public async IAsyncEnumerable<List<WorkItem>> LoadBatchedWorkload([EnumeratorCancellation] CancellationToken token, int batchSize, int runID)
{
DataContext context = null;
try
{
context = GetNewContext(); // create a new dbContext
List<WorkItem> workItems;
bool loadSuccessInner;
while (true)
{
if (token.IsCancellationRequested) break;
loadSuccessInner = false;
context.Dispose();
context = GetNewContext(); // create a new dbContext
RunState currentRunState = context.Runs.Where(a => a.Id == runID).First().Status;
try
{
// Error happens on the following line: Microsoft.Data.SqlClient.SqlException: Timeout
workItems = context.WorkItems.Where(a => a.State == ProcessState.ToProcess).Take(batchSize).ToList();
loadSuccessInner = true;
}
catch (Exception ex)
{
workItems = new List<WorkItem>();
}
if (workItems.Count == 0 && loadSuccessInner)
{
break;
}
//... update to a different RunState
//... if set successful yield the result
//... else cleanup and retry
}
}
finally
{
if (context != null) context.Dispose();
}
}
I verified that EntityFramework (here with MS SQL Server adapter) does full server side query, which
translates to a simple query like this: SELECT TOP 10 field_1, field_2 FROM WorkItems WHERE field_2 = 0
The query usually takes <1ms and the timeout is left on default of 30s
I verified that there are no cancelation requests fired
This happens also when there is only a single worker and no one else is accessing the database. I'm aware that a timeout can happen when the resource is busy or blocked. But until now, I never saw a timeout on any other query yet.
(I'll update this answer whenever more information is being provided.)
Does someone see a flaw in the implementation?
Generally, your code looks fine.
What could cause the end of the async function?
Nothing in the code you showed should normally be an issue. Start by putting another try-catch block inside the loop, to ensure, that no other exceptions are getting thrown anywhere else (especially later in the not shown code):
public async IAsyncEnumerable<List<WorkItem>> LoadBatchedWorkload([EnumeratorCancellation] CancellationToken token, int batchSize, int runID)
{
DataContext context = null;
try
{
context = GetNewContext();
List<WorkItem> workItems;
bool loadSuccessInner;
while (true)
{
try
{
// ... (the inner loop code)
}
catch (Exception e)
{
// TODO: Log the exception here using your favorite method.
throw;
}
}
}
finally
{
if (context != null) context.Dispose();
}
}
Take a look at your log and ensure, that the log does not show any exceptions being thrown. Then additionally log every possible exit condition (break
and return
) from the loop, to find out how and why the code exits the loop.
If there are no other break
or return
statements in your code, then the only way the code can exit from the loop is if zero workItems
are successfully returned from the database.
What could cause the timeout?
Make sure, that any Task
returning/async
methods you call are being called using await
.
To track down, where the exceptions are actually coming from, you should deploy a Debug
release with pdb
files to get a full stack trace with source code line references.
You can also implement a DbCommandInterceptor
and trace failing commands on your own:
public class TracingCommandInterceptor : DbCommandInterceptor
{
public override void CommandFailed(DbCommand command, CommandErrorEventData eventData)
{
LogException(eventData);
}
public override Task CommandFailedAsync(DbCommand command, CommandErrorEventData eventData, CancellationToken cancellationToken = new CancellationToken())
{
LogException(eventData);
return Task.CompletedTask;
}
private static void LogException(CommandErrorEventData eventData)
{
if (eventData.Exception is SqlException sqlException)
{
// -2 = Timeout error
// See https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/cc645611(v=sql.105)?redirectedfrom=MSDN
if (sqlException.Number == -2)
{
var stackTrace = new StackTrace();
var stackTraceText = stackTrace.ToString();
// TODO: Do some logging here and output the stackTraceText
// and other helpful information like the command text etc.
// -->
}
}
}
}
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseLoggerFactory(LoggingFactory);
optionsBuilder.UseSqlServer(connectionString);
optionsBuilder.EnableSensitiveDataLogging();
optionsBuilder.EnableDetailedErrors();
// Add the command interceptor.
optionsBuilder.AddInterceptors(new TracingCommandInterceptor());
base.OnConfiguring(optionsBuilder);
}
Additionally logging the command text of the failed command in the interceptor is also a good idea.