I have a multithreaded Windows Service I've developed with VS 2010 (.NET 4.0) which can have anywhere from a few to a few dozen threads, each retrieving data from a slow server over the Internet and then using a local database to record this data (so the process is Internet-bound, not LAN or CPU bound).
With some regularity, I am getting a flood/flurry/burst of the following error from several threads simultaneously:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The call stack for this error is typically:
at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
I'm not specifying a Connection Timeout in the connection string, and there are other applications and processes working in this database. Has anyone come across this kind of behavior and if so what was done to prevent it?
The most commonly-called method in my data access layer looks like this, and all my other DAL methods follow the same approach:
using (SqlConnection con = new SqlConnection(GetConnectionString()))
using (SqlCommand cmd = new SqlCommand("AddGdsMonitorLogEntry", con))
{
cmd.CommandType = CommandType.StoredProcedure;
/* setting cmd.Parameters [snipped] */
// We have been getting some timeouts writing to the log; wait a little longer than the default.
cmd.CommandTimeout *= 4;
con.Open();
cmd.ExecuteNonQuery();
}
Thanks very much!
EDIT
Given comments about this occurring in mirrored environments, I should indeed mention that the database in question is mirrored. It's marked in SSMS as "Principal, Synchronized", in "High safety without automatic failover (synchronous)" mode.
EDIT 5/26/11
I am seeing nothing in the SQL Server logs to indicate any problems. (I don't have access to the Windows Event Viewer on that server, but I've asked for someone to look for me.)
According to the MSDN Blog post just created today (hooray for Google!):
Microsoft has confirmed that this is a problem in the current release of ADO.NET. This issue will be fixed in ADO.NET version, ships with Visual Studio 2011.
In the meantime, we request to use the following workarounds:
Increase the connection string timeout to 150 sec. This will give the first attempt enough time to connect( 150* .08=12 sec)
Add MinPool Size=20 in the connection string. This will always maintain a minimum of 20 connections in the pool and there will be less chances of creating new connection, thus reducing the chance of this error.
Improve the network performance. Update your NIC drivers to the latest firmware version. We have seen network latency when your NIC card is not compatible with certain Scalable Networking Pack settings. If you are on Windows Vista SP1 or above you may also consider disabling Receive Window Auto-Tuning. If you have NIC teaming enabled, disabling it would be a good option.
The post itself is an interesting read, talking about a TCP/IP connection retry algorithm. And kudos to all the folks who said "hey this looks like it's related to mirroring..."! And note the comment about this being "because of slow response from SQL Server or due to network delays".
UGH!!!
Thanks to everyone who posted. Now we must all ask for a patch to the .NET Framework (or some other ADO.NET patching mechanism), so we don't have to wait for (and buy) Visual Studio 11...