Search code examples
sqlsql-servergaps-and-islands

SQL Server - Complex Scenario - comparing statuses and populating values to proceeding row


In SQL Server, I am having source data like this. The as_of_date|issue_id|issue_status_name|prev_issue_status_name are in my source table and desired output is what I am looking for.

Basically need to compare issue_status_name with prev_issue_status_name and when prev_issue_status_name in (null,open,draft) and issue_status_name in (pending sustainability, pending validation, closed) then we should set the as_of_date and populate in preceding rows until reverse happens.

I tried to do this with first writing case statement to populate 1 else 0 then increment the number but I am not sure how to do for each occurance. Can someone guide me in the right direction please?

as_of_date issue_id issue_status_name prev_issue_status_name desired_output
12/31/2019 123456 Draft NULL NULL
1/1/2020 123456 Draft Draft NULL
1/2/2020 123456 Draft Draft NULL
1/3/2020 123456 Pending Validation Draft 1/3/2020
1/4/2020 123456 Pending Validation Pending Validation 1/3/2020
1/5/2020 123456 Pending Validation Pending Validation 1/3/2020
1/6/2020 123456 Pending Validation Pending Validation 1/3/2020
1/7/2020 123456 Open Pending Validation NULL
1/8/2020 123456 Open Open NULL
1/9/2020 123456 Open Open NULL
1/10/2020 123456 Open Open NULL
1/11/2020 123456 Pending Sustainability Open 1/11/2020
1/12/2020 123456 Pending Sustainability Pending Sustainability 1/11/2020
1/13/2020 123456 Pending Sustainability Pending Sustainability 1/11/2020
1/14/2020 123456 Pending Sustainability Pending Sustainability 1/11/2020
1/15/2020 123456 Pending Validation Pending Sustainability 1/11/2020
1/16/2020 123456 Pending Validation Pending Validation 1/11/2020
1/17/2020 123456 Pending Validation Pending Validation 1/11/2020
1/18/2020 123456 Pending Validation Pending Validation 1/11/2020
1/19/2020 123456 Pending Validation Pending Validation 1/11/2020
1/20/2020 123456 Pending Validation Pending Validation 1/11/2020
1/21/2020 123456 Open Pending Validation NULL
1/22/2020 123456 Open Open NULL
1/23/2020 123456 Open Open NULL
1/24/2020 123456 Open Open NULL
1/25/2020 123456 Open Open NULL
1/26/2020 123456 Open Open NULL
1/27/2020 123456 Closed Open 1/27/2020
1/28/2020 123456 Closed Closed 1/27/2020
1/29/2020 123456 Closed Closed 1/27/2020
1/30/2020 123456 Closed Closed 1/27/2020
1/31/2020 123456 Closed Closed 1/27/2020

I tried the following:

CASE WHEN issue_status_name in (pending validation, closed, pending sustainability) and prev_issue_status_name in (null, open, draft) then sum(case when prev_issue_status_name not in (pending validation, closed, pending sustainability) then as_of_date else null

But it's not giving correct output.


Solution

  • You don't seem to actually care about previous status per se but rather that a run of consecutive statuses of the three interesting ones (pending/closed) should be treated as a unit and then return the first as_of_date from among them. This uses a standard row-numbering gaps and islands technique:

    with data as (
        select *,
            row_number() over (
                partition by issue_id order by as_of_date) -
            row_number() over (
                partition by issue_id, output_flag order by as_of_date) as grp
        from T cross apply (values (
            case when issue_status_name
                      in ('Pending Validation', 'Pending Sustainability', 'Closed')
                 then 1 else 0 end)) v(output_flag) /* not sure what's a good name */
    )
    select as_of_date, issue_id, issue_status_name, prev_issue_status_name,
        case when output_flag = 1
             then min(as_of_date) over (partition by issue_id, grp) end as desired_output
    from data;
    

    https://dbfiddle.uk/6i92BbzL?hide=2