In SQL Server, I am having source data like this. The as_of_date|issue_id|issue_status_name|prev_issue_status_name are in my source table and desired output is what I am looking for.
Basically need to compare issue_status_name with prev_issue_status_name and when prev_issue_status_name in (null,open,draft) and issue_status_name in (pending sustainability, pending validation, closed) then we should set the as_of_date and populate in preceding rows until reverse happens.
I tried to do this with first writing case statement to populate 1 else 0 then increment the number but I am not sure how to do for each occurance. Can someone guide me in the right direction please?
as_of_date | issue_id | issue_status_name | prev_issue_status_name | desired_output |
---|---|---|---|---|
12/31/2019 | 123456 | Draft | NULL | NULL |
1/1/2020 | 123456 | Draft | Draft | NULL |
1/2/2020 | 123456 | Draft | Draft | NULL |
1/3/2020 | 123456 | Pending Validation | Draft | 1/3/2020 |
1/4/2020 | 123456 | Pending Validation | Pending Validation | 1/3/2020 |
1/5/2020 | 123456 | Pending Validation | Pending Validation | 1/3/2020 |
1/6/2020 | 123456 | Pending Validation | Pending Validation | 1/3/2020 |
1/7/2020 | 123456 | Open | Pending Validation | NULL |
1/8/2020 | 123456 | Open | Open | NULL |
1/9/2020 | 123456 | Open | Open | NULL |
1/10/2020 | 123456 | Open | Open | NULL |
1/11/2020 | 123456 | Pending Sustainability | Open | 1/11/2020 |
1/12/2020 | 123456 | Pending Sustainability | Pending Sustainability | 1/11/2020 |
1/13/2020 | 123456 | Pending Sustainability | Pending Sustainability | 1/11/2020 |
1/14/2020 | 123456 | Pending Sustainability | Pending Sustainability | 1/11/2020 |
1/15/2020 | 123456 | Pending Validation | Pending Sustainability | 1/11/2020 |
1/16/2020 | 123456 | Pending Validation | Pending Validation | 1/11/2020 |
1/17/2020 | 123456 | Pending Validation | Pending Validation | 1/11/2020 |
1/18/2020 | 123456 | Pending Validation | Pending Validation | 1/11/2020 |
1/19/2020 | 123456 | Pending Validation | Pending Validation | 1/11/2020 |
1/20/2020 | 123456 | Pending Validation | Pending Validation | 1/11/2020 |
1/21/2020 | 123456 | Open | Pending Validation | NULL |
1/22/2020 | 123456 | Open | Open | NULL |
1/23/2020 | 123456 | Open | Open | NULL |
1/24/2020 | 123456 | Open | Open | NULL |
1/25/2020 | 123456 | Open | Open | NULL |
1/26/2020 | 123456 | Open | Open | NULL |
1/27/2020 | 123456 | Closed | Open | 1/27/2020 |
1/28/2020 | 123456 | Closed | Closed | 1/27/2020 |
1/29/2020 | 123456 | Closed | Closed | 1/27/2020 |
1/30/2020 | 123456 | Closed | Closed | 1/27/2020 |
1/31/2020 | 123456 | Closed | Closed | 1/27/2020 |
I tried the following:
CASE WHEN issue_status_name in (pending validation, closed, pending sustainability) and prev_issue_status_name in (null, open, draft) then sum(case when prev_issue_status_name not in (pending validation, closed, pending sustainability) then as_of_date else null
But it's not giving correct output.
You don't seem to actually care about previous status per se but rather that a run of consecutive statuses of the three interesting ones (pending/closed) should be treated as a unit and then return the first as_of_date
from among them. This uses a standard row-numbering gaps and islands technique:
with data as (
select *,
row_number() over (
partition by issue_id order by as_of_date) -
row_number() over (
partition by issue_id, output_flag order by as_of_date) as grp
from T cross apply (values (
case when issue_status_name
in ('Pending Validation', 'Pending Sustainability', 'Closed')
then 1 else 0 end)) v(output_flag) /* not sure what's a good name */
)
select as_of_date, issue_id, issue_status_name, prev_issue_status_name,
case when output_flag = 1
then min(as_of_date) over (partition by issue_id, grp) end as desired_output
from data;