Search code examples
timeoutazure-sql-databaseazure-webjobs

Timeout errors which happens sometimes on Azure SQL Databases called in Azure Web jobs


We have a web application running on Azure that performs miscellaneous database maintenance tasks like creating databases, deleting unused databases, and so on. Everything is running on Azure SQL.

This application runs 24/24, and the maintenance tasks are performed every hour. Most of the time, everyhing goes well. However, the task sometimes ends up with errors like those ones :

  • HTTP error GatewayTimeout : The gateway did not receive a response from ‘Microsoft.Sql’ within the specified time period
  • HTTP error ServiceUnavailable : The request timed out
  • SQLException : Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
  • SQLException : A connection was successfully established with the server, but then an error occurred during the pre-login handshake

It seems like the database is not reachable when this happens.

We'd be glad if someone could help us to debug the issue.

thank you in advance.


Solution

  • There are transient errors and other type of errors that are particular to Azure SQL Database. Transient fault errors typically manifest as one of the following error messages from your client programs:

    •Database on server is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of

    •Database on server is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of . (Microsoft SQL Server, Error: 40613)

    •An existing connection was forcibly closed by the remote host.

    •System.Data.Entity.Core.EntityCommandExecutionException: An error occurred while executing the command definition. See the inner exception for details. ---> System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: Session Provider, error: 19 - Physical connection is not usable)

    •An connection attempt to a secondary database failed because the database is in the process of reconfguration and it is busy applying new pages while in the middle of an active transation on the primary database.

    Because of those errors and more explained here. It is necessary to create a retry logic on applications that connect to Azure SQL Database.

    public void HandleTransients()
    {
        var connStr = "some database";
        var _policy = RetryPolicy.Create < SqlAzureTransientErrorDetectionStrategy(
            retryCount: 3,
            retryInterval: TimeSpan.FromSeconds(5));
    
        using (var conn = new ReliableSqlConnection(connStr, _policy))
        {
            // Do SQL stuff here.
        }
    }
    

    More about how to create a retry logic here.

    Throttling is also a cause of timeouts. The following queries may help you understand the impact of workloads on the Azure SQL database.

    SELECT 
        (COUNT(end_time) - SUM(CASE WHEN avg_cpu_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'CPU Fit Percent'
        ,(COUNT(end_time) - SUM(CASE WHEN avg_log_write_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Log Write Fit Percent'
        ,(COUNT(end_time) - SUM(CASE WHEN avg_data_io_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Physical Data Read Fit Percent'
    FROM sys.dm_db_resource_stats
    

    --service level objective (SLO) of 99.9% <= go to next tier