I would like to track changes to a dataset in SQL Server.
I currently have data similar to this
Unique Ref FromDate ToDate Status
1 01/01/2020 03/01/2020 A
1 03/01/2020 03/02/2020 NULL
1 03/02/2020 04/04/2020 B
1 04/04/2020 05/04/2020 B
1 05/04/2020 06/06/2020 A
2 03/01/2020 05/01/2020 NULL
2 05/01/2020 06/07/2020 B
2 06/07/2020 07/07/2020 B
2 07/07/2020 08/07/2020 A
I'd like to keep a row for each change to a status grouped by Unique Ref, so that I only see the changes and not where the Status stays the same.
Trying to get the above to look like
Unique Ref Status ChangedDate
1 A 01/01/2020
1 NULL 03/01/2020
1 B 03/02/2020
1 A 05/04/2020
2 NULL 03/01/2020
2 B 05/01/2020
2 A 07/07/2020
I've never done anything similar before so not sure where to start.
Use lag()
. I think this does what you want:
select t.ref, t.status, t.fromDate as changedDate
from (select t.*,
lag(status) over (partition by ref order by fromDate) as prev_status,
row_number() over (partition by ref order fromDate) as seqnum
from t
) t
where prev_status <> status or
prev_status is null and status is not null or
prev_status is not null and status is null or
seqnum = 1;
The NULL
values make this a bit tricky. The last condition handles the case when the first row for a ref is NULL
.
Another approach is a little simpler I think:
select t.ref, t.status, t.fromDate as changedDate
from (select t.*,
lag(fromDate) over (partition by ref order by fromDate) as prev_date,
lag(fromDate) over (partition by ref, status order by fromDate) as prev_status_date
from t
) t
where prev_status_date is null or
prev_status_date <> prev_date;
This just compares the previous date for the status to the previous date for record as a whole.