Search code examples
sqlsql-servertracking

Keep a row for each change in column (including NULL) in SQL Server


I would like to track changes to a dataset in SQL Server.

I currently have data similar to this

    Unique Ref  FromDate    ToDate     Status
           1    01/01/2020  03/01/2020  A
           1    03/01/2020  03/02/2020  NULL
           1    03/02/2020  04/04/2020  B
           1    04/04/2020  05/04/2020  B
           1    05/04/2020  06/06/2020  A
           2    03/01/2020  05/01/2020  NULL
           2    05/01/2020  06/07/2020  B
           2    06/07/2020  07/07/2020  B
           2    07/07/2020  08/07/2020  A

I'd like to keep a row for each change to a status grouped by Unique Ref, so that I only see the changes and not where the Status stays the same.

Trying to get the above to look like

Unique Ref  Status  ChangedDate
         1  A       01/01/2020
         1  NULL    03/01/2020
         1  B       03/02/2020
         1  A       05/04/2020
         2  NULL    03/01/2020
         2  B       05/01/2020
         2  A       07/07/2020

I've never done anything similar before so not sure where to start.


Solution

  • Use lag(). I think this does what you want:

    select t.ref, t.status, t.fromDate as changedDate
    from (select t.*,
                 lag(status) over (partition by ref order by fromDate) as prev_status,
                 row_number() over (partition by ref order fromDate) as seqnum
          from t
         ) t
    where prev_status <> status or
          prev_status is null and status is not null or
          prev_status is not null and status is null or
          seqnum = 1;
    

    The NULL values make this a bit tricky. The last condition handles the case when the first row for a ref is NULL.

    Another approach is a little simpler I think:

    select t.ref, t.status, t.fromDate as changedDate
    from (select t.*,
                 lag(fromDate) over (partition by ref order by fromDate) as prev_date,
                 lag(fromDate) over (partition by ref, status order by fromDate) as prev_status_date
          from t
         ) t
    where prev_status_date is null or
          prev_status_date <> prev_date;
    

    This just compares the previous date for the status to the previous date for record as a whole.