Search code examples
t-sqlsql-server-2014unpivotchange-data-capture

Tracking changes for field using CDC or T-SQL with CDC


Folowing CDC results set for update operation. Only area field has been updated. enter image description here

On previous sreenshot only part of gields that contains in table. Fields are much more. Time to time some from them updates some not. In following query i try to show statistic of changes by fields in usable view.

with History AS (
SELECT
cz.GUID as Id, 
cz.category,
isnull(cz.area, 0) as area, 
isnull(cz.oilwidthmin,0) as oilwidthmin, 
isnull(cz.oilwidthmax,0) as oilwidthmax, 
isnull(cz.efectivwidthmin,0) as efectivwidthmin,
isnull(cz.efectivwidthmax,0) as efectivwidthmax,
isnull(cz.koafporistmin,0) as koafporistmin, 
isnull(cz.koafporistmax,0) as koafporistmax,
CASE cz.__$operation 
WHEN 1 THEN 'DELETE'
WHEN 2 THEN 'INSERT'
WHEN 3 THEN 'Before UPDATE'
WHEN 4 THEN 'After UPDATE'
END operation,
map.tran_begin_time as beginT, 
map.tran_end_time as endT
FROM cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all') AS cz 
INNER JOIN  [cdc].[lsn_time_mapping] map
    ON cz.[__$start_lsn] = map.start_lsn
)
SELECT  field, val, operation, beginT, endT FROM History
unpivot ( [val] for field in
(
--category,
area, 
oilwidthmin, 
oilwidthmax, 
efectivwidthmin, 
efectivwidthmax, 
koafporistmin, 
koafporistmax))t    where id = '2D166098-7CBD-4622-9EB0-000070506FE6'   

Result of the query is following: enter image description here

But previous result contains extra data. Expected result must be following: enter image description here

I know that the CDC tracks changes by rows. Or maybe i'm wrong? If no, how can i doing some comparer for val field in SQL. I have not deep knowledge in t-sql, and everything that comes to my mind is somehow used by cursors. Any ideas? Maybe somehow use CT (change tracking)? Maybe somehow using group by?

Almost correct answer. Folowing query return expected result:

WITH History AS (
    SELECT
        *,
        CASE cz.__$operation 
            WHEN 1 THEN 'DELETE'
            WHEN 2 THEN 'INSERT'
            WHEN 3 THEN 'Before UPDATE'
            WHEN 4 THEN 'After UPDATE'
            END operation,
        map.tran_begin_time as beginT, 
        map.tran_end_time as endT
    FROM cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all') AS cz 
        INNER JOIN  [cdc].[lsn_time_mapping] map
            ON cz.[__$start_lsn] = map.start_lsn
    where cz.GUID = '2D166098-7CBD-4622-9EB0-000070506FE6'
),
UnpivotedValues AS(
    SELECT  guid, field, val, operation, beginT, endT 
    FROM History
        UNPIVOT ( [val] FOR field IN
        (
            area, 
            oilwidthmin, 
            oilwidthmax, 
            efectivwidthmin, 
            efectivwidthmax, 
            koafporistmin, 
            koafporistmax
        ))t
),
UnpivotedWithLastValue AS (
    SELECT 
        *,
        --Use LAG() to get the last value for the same field
        LAG(val, 1) OVER (PARTITION BY field ORDER BY BeginT) LastVal
    FROM UnpivotedValues
)
--Filter out record where the value equals the last value for the same field
SELECT * FROM UnpivotedWithLastValue WHERE val <> LastVal OR LastVal IS NULL ORDER BY guid

Result of this query looks like this: enter image description here

But when WHERE cz.GUID = is absent or if query using several GUIDs in WHERE predicate i get folowing result:

enter image description here This result for two GUIDs. Value for LastVal in first row must be 16691. Like val from row 4.


Solution

  • You cannot set up CDC to only track values for the columns that have changed. However, you can quite easily filter out the unchanged values in your query.

    Consider the following query, which is a simplified copy of your original query:

    WITH History AS (
        SELECT
            *,
            CASE cz.__$operation 
                WHEN 1 THEN 'DELETE'
                WHEN 2 THEN 'INSERT'
                WHEN 3 THEN 'Before UPDATE'
                WHEN 4 THEN 'After UPDATE'
                END operation,
            map.tran_begin_time as beginT, 
            map.tran_end_time as endT
        FROM cdc.fn_cdc_get_all_changes_Dbo_YourTable(sys.fn_cdc_get_min_lsn('Dbo_YourTable'), sys.fn_cdc_get_max_lsn(), 'all') AS cz 
            INNER JOIN  [cdc].[lsn_time_mapping] map
                ON cz.[__$start_lsn] = map.start_lsn
    ),
    UnpivotedValues AS(
        SELECT id, field, val, operation, beginT, endT, t.tran_id
        FROM History
            UNPIVOT ( [val] FOR field IN
            (Column1, Column2, Column3))t
    ),
    UnpivotedWithLastValue AS (
        SELECT 
            *,
            --Use LAG() to get the last value for the same field
            LAG(val, 1) OVER (PARTITION BY id, field ORDER BY BeginT) LastVal
        FROM UnpivotedValues
    )
    --Filter out record where the value equals the last value for the same field
    SELECT * FROM UnpivotedWithLastValue WHERE val <> LastVal OR LastVal IS NULL
    ORDER BY Id, beginT
    

    In this query I have used the LAG() function to get the last value for each field. Based on this value you can filter out the unchanged records in your final query, and shown above.