Search code examples
sql-servercdc

Retrieve CDC net changes without primary key but with unique field


I was wondering if it is possible to retrieve the net changes similar to cdc.fn_cdc_get_net_changes_<capture_instance>(from_lsn , to_lsn, 'all with mask') of tables that don't have a primary key but do have a constraint that ensures that one (or more) column(s) is unique.


Solution

  • It took me a while but I think I have a solution that works, let me know if there's a better solution or if you see a bug in mine.

    Let's assume a capture instance named capture_instance of a table with unique column ID and non-unique columns field1, field2 and field3 and variables @from_lsn and @to_lsn.

    WITH
    cdc_all AS (
      -- Retrieve the change table with all changes
      SELECT *
      FROM cdc.fn_cdc_get_all_changes_capture_instance(@from_lsn, @to_lsn, 'all')
    ),
    f AS (
      SELECT cdc_all.*, ops.[delete], ops.[insert], ops.[update], ops.[net_op]
      FROM cdc_all
      INNER JOIN (
        -- Retrieve three flags for insert, update and delete and the net operation
        -- also filter insert + delete pairs because it results in no change
        SELECT *
        FROM (
          SELECT ID
               , MAX(CASE WHEN __$operation = 1 THEN 1 ELSE 0 END) as [delete]
               , MAX(CASE WHEN __$operation = 2 THEN 1 ELSE 0 END) as [insert]
               , MAX(CASE WHEN __$operation = 4 THEN 1 ELSE 0 END) as [update]
               , MIN(__$operation) [net_op]
          FROM cdc_all
          GROUP BY ID
        ) ops
        WHERE NOT (ops.[delete] = 1 AND ops.[insert] = 1)
      ) ops ON cdc_all.ID = ops.ID
    )
    SELECT net.[max_lsn], f.[net_op] __$operation
         , (CASE WHEN net.__$update_mask != 0x0 THEN net.__$update_mask ELSE NULL END) __$update_mask
         , f.[ID], [field1], [field2], [field3]
    FROM f
    INNER JOIN (
      -- bitwise OR the __$update_mask of the updates
      -- also retrieve the last lsn of each row which should be used as the __$start_lsn of the result set
      SELECT ID
           , CAST(SUM(DISTINCT CAST((CASE WHEN f.[__$operation] = 4 AND f.[insert] != 1 THEN f.[__$update_mask] ELSE 0 END) as int)) as varbinary(2)) [__$update_mask]
           , MAX(__$start_lsn) [max_lsn]
      FROM f
      GROUP BY ID
    ) net ON f.ID = net.ID AND f.__$start_lsn = net.[max_lsn]
    

    To match the behavior of cdc.fn_cdc_get_net_changes_ exactly the size of the varbinary at the end should be as small as possible for all fields to fit, but a larger value wouldn't break the functionality.