Search code examples
sqlsql-serverpivot-table

Identify SKU id's that have changed status twice in a historical table


I have a historical table which has id, date and status columns. I am trying to identify all the instances where the id's have changed from status 1(Active) to 0(Inactive) and came back to 1(Active again). I want to create a pivot table with all the distinct id's as rows, and all the dates as column values. Since, this is a historical table and there are around 10M distinct id's, I am trying to see what would be an ideal/efficient way to pull the data.

The data looks like this:

id       date       status
I01234   5/12/2023  1
I14690   4/13/2021  0

The final structure of the data:

id      01/01/2021 01/02/2021 01/03/2021 ........ 10/31/2023
I01234  1          1          1                   0
I14690  1          0          0                   1

Solution

  • This will get the all the records for each id which have moved from active->inactive->active. Note that this will pick up a record that has this lifecycle at any point, regardless of it's current status.

    Also assumes that the pivot is not necessary and we only want the ids/records.

    There are two main conditions in the inactive_sandwich clause.

    The first checks to see (via EXISTS) if there are any records with active status for that id earlier than a inactive.

    Second is the same idea, but checks for active records later than the load date of the inactive.

    If both of these are true that means you have an earlier "active", a "inactive", and a later "active".

    This result will only give you the inactive records, so the last clause picks up all records which are associated with those ids.

    create table some_test_data 
      ( id varchar(100),
        status integer,
        load_date date
      )
    
    insert into some_test_data values ('abc',1,cast('2023-01-01' as datetime));
    insert into some_test_data values ('abc',0,cast('2023-01-02' as datetime));
    insert into some_test_data values ('abc',1,cast('2023-01-03' as datetime));
    insert into some_test_data values ('cde',1,cast('2023-01-04' as datetime));
    insert into some_test_data values ('cde',1,cast('2023-01-05' as datetime));
    insert into some_test_data values ('efg',1,cast('2023-01-05' as datetime));
    insert into some_test_data values ('efg',0,cast('2023-01-05' as datetime));
    
    with inactive_sandwich as (
        select *
          from some_test_data t1
         where t1.status = 0
           and exists
                 ( select 1
                     from some_test_data earlier_active
                    where t1.id = earlier_active.id
                      and t1.load_date > earlier_active.load_date
                      and earlier_active.status = 1                
                 )
           and exists
                 ( select 1
                     from some_test_data later_active
                    where t1.id = later_active.id
                      and t1.load_date < later_active.load_date
                      and later_active.status = 1                
                 )
    )
    select * 
      from some_test_data t1 
     where exists
             (
               select 1
                 from inactive_sandwich t2
                where t1.id = t2.id
    
             )