Search code examples
entity-frameworkazure-sql-database.net-6.0sqlclient

Intermittent exception against SQL Azure: Unknown error 258


The exception

We have a production issue in one customer instance (of eight) that we aren't able to understand (and therefor not able to resolve). The issue happens less then 10 times on 250 calls to the same API. The exception is as follows:

An exception occurred while iterating over the results of a query for context type '"OurNamespace.OurDbContext2"'."
""System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.Transactions.TransactionException: The operation is not valid for the state of the transaction.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> Microsoft.Data.SqlClient.SqlException (0x80131904): There is already an open DataReader associated with this Command which must be closed first.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction2005(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
ClientConnectionId:09c2251f-ce5e-4f27-9c37-77ba27e69399
Error Number:−2,State:0,Class:11
ClientConnectionId before routing:f245b876-d483-4827-bdd9-ec446c664f64

Full call stack

at System.RuntimeMethodHandle.InvokeMethod(Object target, Span`1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()
   at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternal(Boolean errorsExpected)
   at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.Open(Boolean errorsExpected)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.InitializeReader(Enumerator enumerator)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.MoveNext()
   at System.Linq.Enumerable.TryGetSingle[TSource](IEnumerable`1 source, Boolean& found)
   at lambda_method1124(Closure , QueryContext )
   at OurNamespace.AspNetCore.Identity.EntityFrameworkCore.TenantSettingsStore`1.FindById(Guid tenantId)
   at OurNamespace.AspNetCore.Identity.Caching.DefaultCache`1.GetOrAdd(String key, TimeSpan duration, Func`1 get)
   at OurNamespace.AspNetCore.Identity.ConfigureMultitenantIdentityOptions.Configure(TenantIdentityOptions options)
   at Microsoft.Extensions.Options.OptionsFactory`1.Create(String name)
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
   at System.Lazy`1.ExecutionAndPublication(LazyHelper executionAndPublication, Boolean useDefaultConstructor)
   at System.Lazy`1.CreateValue()
   at Microsoft.Extensions.Options.OptionsManager`1.Get(String name)
   at OurNamespace.AspNetCore.Identity.ExtendedUserValidator`1.get_TenantIdentityOptions()
   at OurNamespace.AspNetCore.Identity.MultitenantUserValidator`1.ValidateTenant(MultitenantUserManager`1 manager, TUser user, ICollection`1 errors)
   at OurNamespace.AspNetCore.Identity.MultitenantUserValidator`1.ValidateAsync(UserManager`1 manager, TUser user)
   at Microsoft.AspNetCore.Identity.UserManager`1.ValidateUserAsync(TUser user)
   at Microsoft.AspNetCore.Identity.UserManager`1.CreateAsync(TUser user)
   at OurNamespace.AspNetCore.Identity.UI.Areas.Identity.Controllers.API.UserViewModels.UserViewModelService`1.CreateUserAsync(TUser user, Boolean sendInvitation, String additionalInvitationQueryParameters)
   at OurNamespace.AspNetCore.Identity.UI.Areas.Identity.Controllers.API.UserViewModels.UserViewModelService`1.CreateUserAsync(CreateUserViewModel model)
   at OurNamespace.AspNetCore.Identity.UI.Areas.Identity.Controllers.API.PersonViewModels.CreatePersonAndUserService`2.<>c__DisplayClass7_0.<<Create>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at IRM.AspNetCore.Identity.ResilientTransaction.<>c__DisplayClass3_0.<<ExecuteAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at OurNamespace.AspNetCore.Identity.ResilientTransaction.ExecuteAsync(Func`2 action, CancellationToken cancellationToken)
   at OurNamespace.AspNetCore.Identity.UI.Areas.Identity.Controllers.API.PersonViewModels.CreatePersonAndUserService`2.Create(CreatePersonViewModel model, String password, ExternalLoginInfo loginInfo, CancellationToken cancellationToken)
   at OurNamespace.AspNetCore.Identity.UI.Areas.Identity.Controllers.API.PersonsController`2.Post(CreatePersonViewModel personViewModel, ICreatePersonAndUserService`2 creationService)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextResourceFilter>g__Awaited|25_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
   at OurNamespace.Serilog.AspNetCore.SerilogCallContextMiddleware.Invoke(HttpContext context, ICallContext callContext)
   at OurNamespace.AspNetCore.Identity.UI.ManageTenantMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authorization.Policy.AuthorizationMiddlewareResultHandler.HandleAsync(RequestDelegate next, HttpContext context, AuthorizationPolicy policy, PolicyAuthorizationResult authorizeResult)
   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at OurNamespace.AspNetCore.Authentication.Cookies.SameSiteCookieFixMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Localization.RequestLocalizationMiddleware.Invoke(HttpContext context)
   at OurNamespace.AspNetCore.ErrorHandling.ExceptionHandlerMiddleware.Awaited(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)

Circumstances that we believe can affect the problem

Here are some thins to have in mind, that we believe can effect the problem:

  • The API where the problem is, is the only one working against two DbContext instances.
    • We therefor creates the execution strategy ourselves in-line with the recommended approach.
    • We use a TransactionScope (also as described in link above).
    • All calls to the first DbContext instance succeeds.
    • Both DbContext use the same database/connection string.
  • The application and the database exists in different Azure Subscriptions (in same datacenter).
    • This is the only instance of the eight where the customer have chosen to host the database themself.

Versions that we use:

  • .NET Core 6
  • Microsoft.EntityFrameworkCore (and related references for SQL Server): 6.0.20
  • Microsoft.Data.SqlClient: 5.1.1

It is a Linux App Service. We use SQL Azure also as our distributed cache.

Example code where it happens

The issue seems to happen on the first call to the database on OurDbContext2, which can be different operations depending on caching and other logic (callstack above is just one example). I choose the callstack above because in this (uncommon) example we get settings from database used in the options in a very simple method:

public TenantSettings FindById(Guid tenantId)
{
    using var scope = _serviceProvider.CreateScope();
    using var context = scope.ServiceProvider.GetService<TContext>()!;
            
    var dbSet = context.Set<TenantSettings>();
    var result = dbSet.AsNoTracking().SingleOrDefault(ts => ts.TenantId == tenantId) ?? new TenantSettings { TenantId = tenantId };
            
    return result;
}

The issue does not involve anything with options or this particular operation (please also ignore the fact that we use an anti-pattern of _serviceProvider.CreateScope because we have specific reasons for this), since it is happening in other situations as well. I choose the above operation, because I believe it makes it pretty clear that the issue is not related to "There is already an open DataReader associated with this Command which must be closed first.", since we create a new DbContext instance locally in its own DI scope.

This is another example where it happens:

protected virtual Task<List<TRole>> GetDefaultRolesForNewUserAsync(TUser user, CancellationToken cancellationToken = default)
{
    return Roles.Where(r => r.AddForNewUser).AsNoTracking().ToListAsync(cancellationToken);
}

Things we have tried

We increased the capacity of the SQL Azure instance (from 20 DTU to 50 DTU) and that resulted in fewer problems happening. Looking at metrics we can't see that the instance is under any kind of load though. We have for example checked sessions, DTU usage, CPU, Failed connections and other metrics.

We have enabled MARS in our connection string, but that did not affect the issue at all.

We have gone through all code to make sure we haven't missed any ToList/FirstOrDefault to actually get the objects before using or iterating through a list (since that is a common reason for "There is already an open DataReader associated with this Command which must be closed first.").

We have worked through many of the reports under troubleshooting App Service performance and availability. There does not seem to be any SNAT Port exhaustion and we did not find any other indications of possible problems on the app service.

Our best guess is that it is some kind of infrastructure problem with the SQL Azure instance (our experience is that "Unknown error 258" often is), but we have no ideas what it could be. The application is not under heavy load (two different App Service instances with about 7-8 rpm per instance) and no spikes is traffic.


Solution

  • We're starting get pretty sure that we have resolved this issue now. The problem seem to have been a combination of things. The root cause is most likely bad Guid:s that wasn't optimized for SQL Server (which they are in the other customer instances). This in turn caused high fragmentation of indexes quickly (we do index-optimizations each night, but they got bad anyway). On top of this our transaction caused a completely other locking situation in this customer instance, both in the tables and over the indexes. Our guess is therefor that this caused a client timeout (Error number=-2).

    We have adjusted the GUID creation so that they are optimized and we have excluded a read (in some scenarios two) in the database from the transaction.

    Since deployed five days ago, we do not have any more exceptions. The indexes still get high fragmentation, but we hope this situation will become better over time (otherwise we'll look into if we can change the badly created GUID values).