I was wondering if it is possible to retrieve the net changes similar to cdc.fn_cdc_get_net_changes_<capture_instance>(from_lsn , to_lsn, 'all with mask')
of tables that don't have a primary key but do have a constraint that ensures that one (or more) column(s) is unique.
It took me a while but I think I have a solution that works, let me know if there's a better solution or if you see a bug in mine.
Let's assume a capture instance named capture_instance
of a table with unique column ID
and non-unique columns field1
, field2
and field3
and variables @from_lsn
and @to_lsn
.
WITH
cdc_all AS (
-- Retrieve the change table with all changes
SELECT *
FROM cdc.fn_cdc_get_all_changes_capture_instance(@from_lsn, @to_lsn, 'all')
),
f AS (
SELECT cdc_all.*, ops.[delete], ops.[insert], ops.[update], ops.[net_op]
FROM cdc_all
INNER JOIN (
-- Retrieve three flags for insert, update and delete and the net operation
-- also filter insert + delete pairs because it results in no change
SELECT *
FROM (
SELECT ID
, MAX(CASE WHEN __$operation = 1 THEN 1 ELSE 0 END) as [delete]
, MAX(CASE WHEN __$operation = 2 THEN 1 ELSE 0 END) as [insert]
, MAX(CASE WHEN __$operation = 4 THEN 1 ELSE 0 END) as [update]
, MIN(__$operation) [net_op]
FROM cdc_all
GROUP BY ID
) ops
WHERE NOT (ops.[delete] = 1 AND ops.[insert] = 1)
) ops ON cdc_all.ID = ops.ID
)
SELECT net.[max_lsn], f.[net_op] __$operation
, (CASE WHEN net.__$update_mask != 0x0 THEN net.__$update_mask ELSE NULL END) __$update_mask
, f.[ID], [field1], [field2], [field3]
FROM f
INNER JOIN (
-- bitwise OR the __$update_mask of the updates
-- also retrieve the last lsn of each row which should be used as the __$start_lsn of the result set
SELECT ID
, CAST(SUM(DISTINCT CAST((CASE WHEN f.[__$operation] = 4 AND f.[insert] != 1 THEN f.[__$update_mask] ELSE 0 END) as int)) as varbinary(2)) [__$update_mask]
, MAX(__$start_lsn) [max_lsn]
FROM f
GROUP BY ID
) net ON f.ID = net.ID AND f.__$start_lsn = net.[max_lsn]
To match the behavior of cdc.fn_cdc_get_net_changes_
exactly the size of the varbinary at the end should be as small as possible for all fields to fit, but a larger value wouldn't break the functionality.