Search code examples
sql-serversql-server-2019change-tracking

SQL Server Change Tracking - dm_tran_commit_table and CHANGETABLE don't match; commit_time is null


The gist of what I'm trying to do: get the commit time for changes in a SQL Server table with change tracking on. Easy, right? Just join with the sys.dm_tran_commit_table DMV and look at the commit_time column. Unfortunately, I'm getting inconsistent results.

Here's my query:

SELECT TOP 100 *
FROM CHANGETABLE(CHANGES [MyDB].[dbo].MyTable, 0) CT
LEFT JOIN [MyDB].[dbo].MyTable C ON C.ID = CT.ID
LEFT JOIN [MyDB].sys.dm_tran_commit_table TCI ON CT.sys_change_creation_version = TCI.commit_ts
LEFT JOIN [MyDB].sys.dm_tran_commit_table TC ON CT.sys_change_version = TC.commit_ts
WHERE TC.commit_time IS NULL

I'd like to get the time a record was initially inserted (sys_change_creation_version) and the time of the latest commit (sys_change_version). But for reasons I can't explain, the first join above to the DMV returns data, but the second does not when sys_change_creation_version and sys_change_version are the same value.

See this screenshot:

enter image description here

How in the world does a join on the same table for the same value return results for one join but not the other?

Thinking there may be an issue with the DMV changing during my query execution, I tried pulling out all data from sys.dm_tran_commit_table into a temp table and then used that instead in my query above, but I get the same null results.

There must be something deeper inside change tracking that I'm not grokking that is causing this. Frankly, I'm not sure how/why the sys.dm_tran_commit_table DMV wouldn't have the commit_ts in it if CHANGETABLE is reporting it exists. Why is there a discrepancy between these two objects, and why does one join work but not the other?

Anyone with expertise here?


Solution

  • After much research I think I'm going to close this one out as being part of the fundamental machinations of MSSQL. A few things that I had not taken into account in my queries:

    • Change tracking clean up is happening behind the scenes, regularly, and so record counts returned by the CHANGETABLE function would continue dropping while nothing else was happening. This made it hard to reconcile change counts

    • Isolation levels means data can change from underneath you. Imagine this code, run on a table with 1 million records:

      SELECT Id, CHANGE_TRACKING_CURRENT_VERSION() FROM MyTable

      A very simplistic understanding (what I had going into this) would let you think CHANGE_TRACKING_CURRENT_VERSION() should return the same value for all million records, but in fact, records are being added and deleted while this select statement is running, which means CHANGE_TRACKING_CURRENT_VERSION keeps changing. In essence, my isolation level meant data was changing out from underneath me, which obviously makes it hard to get "exact" counts.

    • I was trying to retrieve ALL changes in the system (by passing in 0 or null in the CHANGETABLE function). When you do this, changes roll up and it becomes very difficult to truly know whether a record was inserted or updated, given my previous two bullet points. MSSQL change tracking is really designed to give you snapshots between change versions, but if you try to view them all at once, you're not going to be happy with the results you get.

    To sum up, if there's anyone out there trying to do deep change tracking analysis like I was, I'd leave this advice: rethink it. Unless you are saving external logs of all the changes between the versions you're looking at, or possibly the highest level of isolation in place, you're going to be very frustrated with results that don't stay consistent. In my case, I ended up writing down in another table the "log" of every extraction operation so that I had a static set of numbers when looking across change versions.