Search code examples
sql-serverssisscd2

Identify the Column changed in SCD Type 2 in SSIS SQL server


I am tracking data in my SCD table as shown below image using the SSIS package.

enter image description here

I need to add a new column, the "Column Updated" (as depicted above) which represents what columns were updated between N and N-1 transaction. This can be achieved by Cursor however I am looking for suggestions to do this in an efficient way. Would it be possible to perform within SCD or any other inbuilt SQL server function?

adding script:

    Create table SCDtest 
(
 id int ,
 empid int ,
 Deptid varchar(10),
 Ename varchar(50),
 DeptName varchar(50),
 city varchar(50),
 startdate datetime,
 Enddate datetime , 
 ColumnUpdated varchar(500)
)

Insert into SCDtest values (1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020','' )
Insert into SCDtest values (2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020','' )
Insert into SCDtest values (3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020','' )
Insert into SCDtest values (4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL ,'' )
Insert into SCDtest values (5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL,'')

Thank you


Solution

  • Honestly I don't really know why you need this functionality as you can very easily just look at the two rows to see any changes, on the off chance that you do actually need to see them. I've never needed a ColumnUpdated type value and I don't think the processing required to generate one and the storage to hold the data is worth having it.


    That said, here is one way you can calculate the desired output from your given test data. Ideally you would do this in a more efficient way as part of your ETL process that is updating the rows as they come in rather than all at once. Though this obviously required info about your ETL that you haven't included in your question:

    Query

    declare @SCDtest table(id int,empid int,Deptid varchar(10),Ename varchar(50),DeptName varchar(50),city varchar(50),startdate datetime,Enddate datetime);
    Insert into @SCDtest values(1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020'),(2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020'),(3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020'),(4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL),(5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL);
    
    with l as
    (
        select *
              ,lag(id,1) over (partition by empid order by id) as l
        from @SCDtest
    )
    select l.id
          ,l.empid
          ,l.Deptid
          ,l.Ename
          ,l.DeptName
          ,l.city
          ,l.startdate
          ,l.Enddate
          ,stuff(concat(case when l.Deptid <> t.Deptid then ', Deptid' end
                       ,case when l.Ename <> t.Ename then ', Ename' end
                       ,case when l.DeptName <> t.DeptName then ', DeptName' end
                       ,case when l.city <> t.city then ', city' end
                       )
                ,1,2,''
                ) as ColumnUpdated
    from l
        left join @SCDtest as t
            on l.l = t.id
    order by l.empid
            ,l.startdate;
    

    Output

    +----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
    | id | empid | Deptid | Ename | DeptName  |    city     |        startdate        |         Enddate         |     ColumnUpdated     |
    +----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
    |  1 |     1 | D1     | Mike  | Account   | Atlanta     | 2020-07-31 00:00:00.000 | 2020-08-03 00:00:00.000 | NULL                  |
    |  3 |     1 | D1     | Ross  | Account   | Atlanta     | 2020-08-04 00:00:00.000 | 2020-08-07 00:00:00.000 | Ename                 |
    |  5 |     1 | D1     | John  | Marketing | Boston      | 2020-08-08 00:00:00.000 | NULL                    | Ename, DeptName, city |
    |  2 |     2 | D2     | Roy   | IT        | New York    | 2020-07-31 00:00:00.000 | 2020-08-05 00:00:00.000 | NULL                  |
    |  4 |     2 | D2     | Roy   | IT        | Los angeles | 2020-08-05 00:00:00.000 | NULL                    | city                  |
    +----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+