Search code examples
sql-servert-sqltransactionssequence

SQL query to get specific sequence of events


I have data that looks like this:

invoice_number activity_date activity_number activity_description system_process event_number next_event_date next_event next_event_description next_event_system_process days_until_next_event
12 10/15/2017 780 PREPARING TO SEND Account Assigned to Vendor 1 10/15/2017 780 PREPARING TO SEND system 0
12 10/15/2017 780 PREPARING TO SEND system 2 10/16/2017 781 WITH VENDOR system 1
12 10/16/2017 781 WITH VENDOR system 3 4/1/2018 781 WITH VENDOR Account To Be Closed With Vendor 167
12 4/1/2018 781 WITH VENDOR Account To Be Closed With Vendor 4 5/2/2018 781 WITH VENDOR Account To Be Closed With Vendor 31
12 5/2/2018 781 WITH VENDOR Account To Be Closed With Vendor 5 5/29/2018 782 RETURNED system 27
12 5/29/2018 782 RETURNED system 6 11/10/2019 783 PREPARING TO SEND system 530
12 11/10/2019 783 PREPARING TO SEND Account Reassigned to Vendor 7 11/11/2019 781 WITH VENDOR system 1
12 11/11/2019 781 WITH VENDOR system 8 11/12/2019 782 RETURNED system 1
12 11/12/2019 782 RETURNED system 9 10/8/2020 781 WITH VENDOR system 331
12 10/8/2020 781 WITH VENDOR system 10 7/16/2022 782 RETURNED system 646
12 7/16/2022 782 RETURNED system 11 NULL NULL NULL NULL NULL

I want get a result that looks like this:

invoice_number activity_date activity_number activity_description system_process event_number next_event_date next_event next_event_description
12 10/16/2017 781 WITH Vendor system 3 5/29/2018 782 RETURNED
12 11/11/2019 781 WITH Vendor system 8 11/12/2019 782 RETURNED
12 10/8/2020 781 WITH Vendor system 10 7/16/2022 782 RETURNED

I have written the following T-SQL query:

DROP TABLE IF EXISTS #EVENT_TEMP;
WITH CTE AS (
        SELECT A.invoice_no,
            A.activity_date,
            A.activity,
            A.activity_description,
            A.system_process,
            A.event_number,
            B.next_event_date,
            B.next_event,
            B.next_event_description,
            [with_vendor_flag_number] = ROW_NUMBER() OVER (
                PARTITION BY A.invoice_no,
                A.activity_description,
                B.NEXT_EVENT_DATE ORDER BY A.invoice_no,
                    A.activity_date
                )
        FROM DBO.c_activity_base_tbl AS A
        OUTER APPLY (
            SELECT TOP 1 Z.*
            FROM DBO.c_activity_base_tbl AS Z
            WHERE a.invoice_no = z.invoice_no
                AND A.activity_date <= Z.next_event_date
                AND A.event_number < Z.event_number
                AND (
                    Z.next_event_description = 'RETURNED'
                    OR Z.next_event_description IS NULL
                    )
                AND NOT EXISTS (
                    SELECT 1
                    FROM DBO.c_activity_base_tbl AS X
                    WHERE Z.invoice_no = X.invoice_no
                    AND Z.activity_description = 'RETURNED'
                    AND X.next_event_description = 'RETURNED'
                )
            ) AS B
        WHERE A.activity_description = 'WITH VENDOR'
        AND A.pt_no = '12'
    )

SELECT *,
    [prev_event_date] = ISNULL(LAG(CTE.next_event_date, 1) OVER(
        PARTITION BY CTE.invoice_no 
        ORDER BY CTE.activity_date), 
    activity_date)

INTO #EVENT_TEMP
FROM CTE 
WHERE with_vendor_flag_number = 1
ORDER BY event_number;

SELECT *
FROM #EVENT_TEMP
ORDER BY activity_date;

This returns the following:

invoice_number activity_date activity_number activity_description system_process event_number next_event_date next_event next_event_description prev_event_date
12 2017-10-16 781 WITH VENDOR RPM 3 2018-05-29 782 RETURNED 2017-10-16
12 2018-05-02 781 WITH VENDOR Account To Be Closed With Kopp 5 2019-11-12 782 RETURNED 2018-05-29
12 2019-11-11 781 WITH VENDOR RPM 8 2022-07-16 782 RETURNED 2019-11-12
12 2020-10-08 781 WITH VENDOR RPM 10 NULL NULL NULL 2022-07-16

The data I do not want is:

invoice_number activity_date activity_number activity_description system_process event_number next_event_date next_event next_event_description prev_event_date
12 2018-05-02 781 WITH VENDOR Account To Be Closed With Kopp 5

For clarification:

The base table gives the sequence of events that happen provided in the first table shown, from these records I only want certain records, specifically I only want the records that match the following.

I want to see the first record (with minimum activity_date) that occurs before an invoice is returned. For example from the first table I want to see the following records that occur before being returned:

DATE ACTIVITY
10/16/2017 WITH VENDOR
11/11/2019 WITH VENDOR
10/08/2020 WITH VENDOR

I then want to see the corresponding return, like so:

DATE ACTIVITY
05/29/2018 RETURNED
11/12/2019 RETURNED
07/16/2022 RETURNED

The record I said I don't want is unwanted because it is not the first occurrence of the invoice going to the vendor before it was returned by them.

The data flows in such a way that a record can be marked WITH VENDOR multiple times before it is RETURNED AND can be RETURNED multiple times before it goes back out. I have no control over that.

The clock resets after every RETURN. I am only concerned about records that will show the time an invoice is with a vendor, not when it is not. Preparing to send is not with the vendor which is why it is not important. I am thinking of it like a chain of events. I want to see how many times an invoice goes to a vendor and how long it is with them. I want to measure things like time with them and other down stream events that could occur while an invoice is with them.

I will always want the earliest WITH VENDOR that occurs before the first RETURN record shows up and the first WITH VENDOR record after the aforementioned RETURN if it exists.


Solution

  • It is sounding like you want logic to:

    1. Ignore all events other than "WITH VENDOR" and "RETURN".
    2. Eliminate duplicate "WITH VENDOR" and "RETURN" events preceded by the same. The remaining data now represents just the transitions.
    3. Match up each remaining "WITH VENDOR" with the immediately following "RETURN" (with a possibility of no match = still with vendor).
    4. If a "RETURN" is not preceded by a "WITH VENDOR", the return is ignored. (This condition is not expected.)

    The first step is a straightforward filter.

    Eliminating the duplicates can be done with the help of the LAG() function and a comparison, with care to candle the null-lag-value case.

    The final matchup between the "WITH VENDOR" and "RETURN" rows can be done by assigning rows numbers to the filtered data and then joining the "WITH VENDOR" rows with the immediately following (row number + 1) "RETURN" rows. A LEFT JOIN can be used to handle the no-yet-returned case.

    The result is something like.

    WITH CTE_FilteredActivity AS (
        SELECT
            FA1.*,
            ROW_NUMBER() OVER(
                PARTITION BY FA1.invoice_number
                ORDER BY FA1.event_number
                ) AS RowNum
        FROM (
            SELECT
                A.*,
                LAG(A.activity_number) OVER(
                    PARTITION BY A.invoice_number
                    ORDER BY A.event_number
                    ) AS PriorActivityNumber
            FROM c_activity_base_tbl A
            WHERE A.activity_number IN (781, 782) -- 'WITH VENDOR', 'RETURNED'
        ) FA1
        WHERE FA1.activity_number <> ISNULL(FA1.PriorActivityNumber, '')
    )
    SELECT
        FA1.invoice_number,
        FA1.event_number,
        FA1.activity_date,
        FA1.activity_description,
        FA2.event_number,
        FA2.activity_date,
        FA2.activity_description,
        DATEDIFF(day, FA1.activity_date, FA2.activity_date) AS ElapsedDays,
        DATEDIFF(day, FA1.activity_date, ISNULL(FA2.activity_date, GETDATE())) AS ElapsedDays2
    FROM CTE_FilteredActivity FA1
    LEFT JOIN CTE_FilteredActivity FA2
        ON FA2.invoice_number = FA1.invoice_number
        AND FA2.RowNum = FA1.RowNum + 1
        AND FA2.activity_number = 782 -- 'RETURNED'
    WHERE FA1.activity_number = 781 -- 'WITH VENDOR'
    ORDER BY FA1.invoice_number, FA1.event_number
    

    The above includes a few additional assumptions:

    1. The event_number is a more reliable sequence than activity_date (since the latter could potentially have duplicates).
    2. The activity_number and activity_description columns are a 1:1 match. As such, filtering was done using activity_number.
    3. I added an ElapsedDays calculation. If there is no return, that calculation could optionally use the current date.

    Your supplied sample data looks to already be the results of an earlier query or view. If desired, the above logic can be adapted to directly reference the underlying tables.

    CTE preliminary results:

    invoice
    number
    activity
    date
    activity
    number
    activity
    description
    system
    process
    event
    number
    Prior
    Activity
    Number
    RowNum
    12 2017-10-16 781 WITH VENDOR system 3 null 1
    12 2018-05-29 782 RETURNED system 6 781 2
    12 2019-11-11 781 WITH VENDOR system 8 782 3
    12 2019-11-12 782 RETURNED system 9 781 4
    12 2020-10-08 781 WITH VENDOR system 10 782 5
    12 2022-07-16 782 RETURNED system 11 781 6

    Final results:

    invoice
    number
    event
    number
    activity
    date
    activity
    description
    event
    number
    activity
    date
    activity
    description
    Elapsed
    Days
    12 3 2017-10-16 WITH VENDOR 6 2018-05-29 RETURNED 225
    12 8 2019-11-11 WITH VENDOR 9 2019-11-12 RETURNED 1
    12 10 2020-10-08 WITH VENDOR 11 2022-07-16 RETURNED 646

    See this db<>fiddle for a demo, including extra test data, including an anomalous leading "RETURN", an unmatched trailing "WITH VENDOR", plus a mix of repeated statuses.