I am tracking data in my SCD table as shown below image using the SSIS package.
I need to add a new column, the "Column Updated" (as depicted above) which represents what columns were updated between N and N-1 transaction. This can be achieved by Cursor however I am looking for suggestions to do this in an efficient way. Would it be possible to perform within SCD or any other inbuilt SQL server function?
adding script:
Create table SCDtest
(
id int ,
empid int ,
Deptid varchar(10),
Ename varchar(50),
DeptName varchar(50),
city varchar(50),
startdate datetime,
Enddate datetime ,
ColumnUpdated varchar(500)
)
Insert into SCDtest values (1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020','' )
Insert into SCDtest values (2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020','' )
Insert into SCDtest values (3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020','' )
Insert into SCDtest values (4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL ,'' )
Insert into SCDtest values (5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL,'')
Thank you
Honestly I don't really know why you need this functionality as you can very easily just look at the two rows to see any changes, on the off chance that you do actually need to see them. I've never needed a ColumnUpdated
type value and I don't think the processing required to generate one and the storage to hold the data is worth having it.
That said, here is one way you can calculate the desired output from your given test data. Ideally you would do this in a more efficient way as part of your ETL process that is updating the rows as they come in rather than all at once. Though this obviously required info about your ETL that you haven't included in your question:
declare @SCDtest table(id int,empid int,Deptid varchar(10),Ename varchar(50),DeptName varchar(50),city varchar(50),startdate datetime,Enddate datetime);
Insert into @SCDtest values(1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020'),(2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020'),(3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020'),(4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL),(5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL);
with l as
(
select *
,lag(id,1) over (partition by empid order by id) as l
from @SCDtest
)
select l.id
,l.empid
,l.Deptid
,l.Ename
,l.DeptName
,l.city
,l.startdate
,l.Enddate
,stuff(concat(case when l.Deptid <> t.Deptid then ', Deptid' end
,case when l.Ename <> t.Ename then ', Ename' end
,case when l.DeptName <> t.DeptName then ', DeptName' end
,case when l.city <> t.city then ', city' end
)
,1,2,''
) as ColumnUpdated
from l
left join @SCDtest as t
on l.l = t.id
order by l.empid
,l.startdate;
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
| id | empid | Deptid | Ename | DeptName | city | startdate | Enddate | ColumnUpdated |
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
| 1 | 1 | D1 | Mike | Account | Atlanta | 2020-07-31 00:00:00.000 | 2020-08-03 00:00:00.000 | NULL |
| 3 | 1 | D1 | Ross | Account | Atlanta | 2020-08-04 00:00:00.000 | 2020-08-07 00:00:00.000 | Ename |
| 5 | 1 | D1 | John | Marketing | Boston | 2020-08-08 00:00:00.000 | NULL | Ename, DeptName, city |
| 2 | 2 | D2 | Roy | IT | New York | 2020-07-31 00:00:00.000 | 2020-08-05 00:00:00.000 | NULL |
| 4 | 2 | D2 | Roy | IT | Los angeles | 2020-08-05 00:00:00.000 | NULL | city |
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+