Search code examples
c#multithreadingsql-server-2016database-deadlocks

Can C# yield return cause problems if used in combination with temp tables and Change Tracking?


We are using Microsoft SQL Server 2016 (SP2-CU12) (KB4536648) - 13.0.5698.0 (X64) Feb 15 2020 01:47:30 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: )

We have a table with more than a million rows, with lots of big fields in it (nvarchar(max), bigint, uniqueidentifier). We are using change tracking on the table. This table is accessed by a multi-thread web application and many other multi-thread applications as well.

Now in one of the applications, the developers did the following code.

public IEnumerable<MyObject> GetRecords(long trackingKey)
{
  using(SqlConnection conn = new SqlConnection(_connectionString))
  {
    conn.Open();
    using(SqlCommand cmd = conn.CreateCommand())
    {
      StringBuilder builder = new StringBuilder();
      builder.Append("SELECT Columns INTO #TEMP FROM CHANGETABLE(CHANGES BigTable, @TrackingKey) AS CT JOIN BigTable t ON t.Key=CT.Key WHERE Conditions=@Conditions 

       SELECT * FROM #TEMP ");
      cmd.CommandText = builder.ToString();
      cmd.Parameters.Add...;

      using(SqlDataReader reader = cmd.ExecuteReader())
      {
        while(reader.Read())
        {
          yield return (MyObject)ExecuteMyReader(reader);
        }
      }
    }
  }
}

Inside the "ExecuteMyReader", they are reading from the reader, and for that row, executing a second query to another table (which takes about 150 ms in average to execute).

Querying the change tables in SQL Query seems to lock the table while it's getting results from the Change tracking tables. That's why the devs store results in a temp table before doing the final SELECT query. (In order to try to avoid deadlocks as much as possible.)

Question #1 : Will the lock for the change tracking table will be held the while the second SELECT query completes? Or it will be released as soon as the results get into the #Temp table?

Question #2 : Is there any difference between doing a yield for the reader, as opposed to just do a classic method where everything is stored in a variable? For example, would the RAM and #Temp table be released sooner?

Question #3 : From a multi-thread point, if other queries try to access the same table at the same time, is there a better way to avoid deadlocks? (with yields, classic method without yields)


Solution

    1. in theory, it should just be the insert, but creating temp tables with into can cause problems; maybe try declaring a table variable instead

    2. iterators have the advantage that they don't need to buffer everything (they can be consumed as an open sequence) - so they can be more RAM efficient, but the side-effect of that is that the command/connection is potentially held open longer; this might be of particular concern if there are locks involved, as the DAL code has no idea how long the consumer will take between each row

    3. do you need all that isolation? maybe consider querying it at a lower isolation level, and use optimistic concurrency (i.e. rowversion) for fault detection?