Search code examples
sqlsql-servert-sqlwindow-functions

Split Table into Windows with Recurring Attributes


My title is awful, because I am not sure how to describe the challenge. I would love an edit if someone can think of a more descriptive title. Hopefully my input/desired output will help explain. Here is some sample input data:

create table #input (
    num varchar(10),
    code varchar(10),
    event_date date
)

insert into #input (num, code, event_date)
values('123456', 'Active', '2007-09-10'),
      ('123456', 'Active', '2010-09-15'),
      ('123456', 'Active', '2010-09-24'),
      ('123456', 'Inactive', '2018-09-17'),
      ('123456', 'Inactive', '2019-01-01'),
      ('123456', 'Active', '2019-02-08')

select *
from #input
order by event_date

I want to tag each record for each group of num + code with the same number. However, I want the time periods to stay separate. Here is the desired result:

create table #result (
    num varchar(10),
    code varchar(10),
    event_date date,
    tag int
)

insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
      ('123456', 'Active', '2010-09-15', 1),
      ('123456', 'Active', '2010-09-24', 1),
      ('123456', 'Inactive', '2018-09-17', 2),
      ('123456', 'Inactive', '2019-01-01', 2),
      ('123456', 'Active', '2019-02-08', 3)

select *
from #result
order by event_date

Obviously normal window partitions like this...

select *, row_number() over(partition by num, code order by event_date) rn
from #input
order by event_date

...don't work, because there is no field on which to partition that would split the two "Active" groups (two groups, because they happen during two time frames). How would I reach my desired result? I have a hunch that a series of lag() and lead() functions might work, but I couldn't get anywhere meaningful.

Alternatively, how would I achieve the results so the categories overlap by one?

create table #result_new (
    num varchar(10),
    code varchar(10),
    event_date date,
    tag int
)

insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
      ('123456', 'Active', '2010-09-15', 1),
      ('123456', 'Active', '2010-09-24', 1),
      ('123456', 'Inactive', '2018-09-17', 1),
      ('123456', 'Inactive', '2019-01-01', 2),
      ('123456', 'Active', '2019-02-08', 2)

select *
from #result_new
order by event_date

Solution

  • LAG gets your half way there, but not the whole way. You can use LAG to check the value of the last row, and create (what I have called) a switch. You can then use a SUM window function, with a ROWs BETWEEN clause to get the value for tag:

    WITH CTE AS(
          SELECT num,
                 code,
                 event_date,
                 CASE WHEN code = LAG(code) OVER (PARTITION BY num ORDER BY event_date) THEN 0 ELSE 1 END AS Switch
          FROM #input)
    SELECT num,
           code,
           event_date,
           SUM(Switch) OVER (PARTITION BY num ORDER BY event_date
                             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tag
    FROM CTE;