I have data that looks like this:
invoice_number | activity_date | activity_number | activity_description | system_process | event_number | next_event_date | next_event | next_event_description | next_event_system_process | days_until_next_event |
---|---|---|---|---|---|---|---|---|---|---|
12 | 10/15/2017 | 780 | PREPARING TO SEND | Account Assigned to Vendor | 1 | 10/15/2017 | 780 | PREPARING TO SEND | system | 0 |
12 | 10/15/2017 | 780 | PREPARING TO SEND | system | 2 | 10/16/2017 | 781 | WITH VENDOR | system | 1 |
12 | 10/16/2017 | 781 | WITH VENDOR | system | 3 | 4/1/2018 | 781 | WITH VENDOR | Account To Be Closed With Vendor | 167 |
12 | 4/1/2018 | 781 | WITH VENDOR | Account To Be Closed With Vendor | 4 | 5/2/2018 | 781 | WITH VENDOR | Account To Be Closed With Vendor | 31 |
12 | 5/2/2018 | 781 | WITH VENDOR | Account To Be Closed With Vendor | 5 | 5/29/2018 | 782 | RETURNED | system | 27 |
12 | 5/29/2018 | 782 | RETURNED | system | 6 | 11/10/2019 | 783 | PREPARING TO SEND | system | 530 |
12 | 11/10/2019 | 783 | PREPARING TO SEND | Account Reassigned to Vendor | 7 | 11/11/2019 | 781 | WITH VENDOR | system | 1 |
12 | 11/11/2019 | 781 | WITH VENDOR | system | 8 | 11/12/2019 | 782 | RETURNED | system | 1 |
12 | 11/12/2019 | 782 | RETURNED | system | 9 | 10/8/2020 | 781 | WITH VENDOR | system | 331 |
12 | 10/8/2020 | 781 | WITH VENDOR | system | 10 | 7/16/2022 | 782 | RETURNED | system | 646 |
12 | 7/16/2022 | 782 | RETURNED | system | 11 | NULL | NULL | NULL | NULL | NULL |
I want get a result that looks like this:
invoice_number | activity_date | activity_number | activity_description | system_process | event_number | next_event_date | next_event | next_event_description |
---|---|---|---|---|---|---|---|---|
12 | 10/16/2017 | 781 | WITH Vendor | system | 3 | 5/29/2018 | 782 | RETURNED |
12 | 11/11/2019 | 781 | WITH Vendor | system | 8 | 11/12/2019 | 782 | RETURNED |
12 | 10/8/2020 | 781 | WITH Vendor | system | 10 | 7/16/2022 | 782 | RETURNED |
I have written the following T-SQL query:
DROP TABLE IF EXISTS #EVENT_TEMP;
WITH CTE AS (
SELECT A.invoice_no,
A.activity_date,
A.activity,
A.activity_description,
A.system_process,
A.event_number,
B.next_event_date,
B.next_event,
B.next_event_description,
[with_vendor_flag_number] = ROW_NUMBER() OVER (
PARTITION BY A.invoice_no,
A.activity_description,
B.NEXT_EVENT_DATE ORDER BY A.invoice_no,
A.activity_date
)
FROM DBO.c_activity_base_tbl AS A
OUTER APPLY (
SELECT TOP 1 Z.*
FROM DBO.c_activity_base_tbl AS Z
WHERE a.invoice_no = z.invoice_no
AND A.activity_date <= Z.next_event_date
AND A.event_number < Z.event_number
AND (
Z.next_event_description = 'RETURNED'
OR Z.next_event_description IS NULL
)
AND NOT EXISTS (
SELECT 1
FROM DBO.c_activity_base_tbl AS X
WHERE Z.invoice_no = X.invoice_no
AND Z.activity_description = 'RETURNED'
AND X.next_event_description = 'RETURNED'
)
) AS B
WHERE A.activity_description = 'WITH VENDOR'
AND A.pt_no = '12'
)
SELECT *,
[prev_event_date] = ISNULL(LAG(CTE.next_event_date, 1) OVER(
PARTITION BY CTE.invoice_no
ORDER BY CTE.activity_date),
activity_date)
INTO #EVENT_TEMP
FROM CTE
WHERE with_vendor_flag_number = 1
ORDER BY event_number;
SELECT *
FROM #EVENT_TEMP
ORDER BY activity_date;
This returns the following:
invoice_number | activity_date | activity_number | activity_description | system_process | event_number | next_event_date | next_event | next_event_description | prev_event_date |
---|---|---|---|---|---|---|---|---|---|
12 | 2017-10-16 | 781 | WITH VENDOR | RPM | 3 | 2018-05-29 | 782 | RETURNED | 2017-10-16 |
12 | 2018-05-02 | 781 | WITH VENDOR | Account To Be Closed With Kopp | 5 | 2019-11-12 | 782 | RETURNED | 2018-05-29 |
12 | 2019-11-11 | 781 | WITH VENDOR | RPM | 8 | 2022-07-16 | 782 | RETURNED | 2019-11-12 |
12 | 2020-10-08 | 781 | WITH VENDOR | RPM | 10 | NULL | NULL | NULL | 2022-07-16 |
The data I do not want is:
invoice_number | activity_date | activity_number | activity_description | system_process | event_number | next_event_date | next_event | next_event_description | prev_event_date |
---|---|---|---|---|---|---|---|---|---|
12 | 2018-05-02 | 781 | WITH VENDOR | Account To Be Closed With Kopp | 5 |
For clarification:
The base table gives the sequence of events that happen provided in the first table shown, from these records I only want certain records, specifically I only want the records that match the following.
I want to see the first record (with minimum activity_date) that occurs before an invoice is returned. For example from the first table I want to see the following records that occur before being returned:
DATE | ACTIVITY |
---|---|
10/16/2017 | WITH VENDOR |
11/11/2019 | WITH VENDOR |
10/08/2020 | WITH VENDOR |
I then want to see the corresponding return, like so:
DATE | ACTIVITY |
---|---|
05/29/2018 | RETURNED |
11/12/2019 | RETURNED |
07/16/2022 | RETURNED |
The record I said I don't want is unwanted because it is not the first occurrence of the invoice going to the vendor before it was returned by them.
The data flows in such a way that a record can be marked WITH VENDOR multiple times before it is RETURNED AND can be RETURNED multiple times before it goes back out. I have no control over that.
The clock resets after every RETURN. I am only concerned about records that will show the time an invoice is with a vendor, not when it is not. Preparing to send is not with the vendor which is why it is not important. I am thinking of it like a chain of events. I want to see how many times an invoice goes to a vendor and how long it is with them. I want to measure things like time with them and other down stream events that could occur while an invoice is with them.
I will always want the earliest WITH VENDOR that occurs before the first RETURN record shows up and the first WITH VENDOR record after the aforementioned RETURN if it exists.
It is sounding like you want logic to:
The first step is a straightforward filter.
Eliminating the duplicates can be done with the help of the LAG()
function and a comparison, with care to candle the null-lag-value case.
The final matchup between the "WITH VENDOR" and "RETURN" rows can be done by assigning rows numbers to the filtered data and then joining the "WITH VENDOR" rows with the immediately following (row number + 1) "RETURN" rows. A LEFT JOIN
can be used to handle the no-yet-returned case.
The result is something like.
WITH CTE_FilteredActivity AS (
SELECT
FA1.*,
ROW_NUMBER() OVER(
PARTITION BY FA1.invoice_number
ORDER BY FA1.event_number
) AS RowNum
FROM (
SELECT
A.*,
LAG(A.activity_number) OVER(
PARTITION BY A.invoice_number
ORDER BY A.event_number
) AS PriorActivityNumber
FROM c_activity_base_tbl A
WHERE A.activity_number IN (781, 782) -- 'WITH VENDOR', 'RETURNED'
) FA1
WHERE FA1.activity_number <> ISNULL(FA1.PriorActivityNumber, '')
)
SELECT
FA1.invoice_number,
FA1.event_number,
FA1.activity_date,
FA1.activity_description,
FA2.event_number,
FA2.activity_date,
FA2.activity_description,
DATEDIFF(day, FA1.activity_date, FA2.activity_date) AS ElapsedDays,
DATEDIFF(day, FA1.activity_date, ISNULL(FA2.activity_date, GETDATE())) AS ElapsedDays2
FROM CTE_FilteredActivity FA1
LEFT JOIN CTE_FilteredActivity FA2
ON FA2.invoice_number = FA1.invoice_number
AND FA2.RowNum = FA1.RowNum + 1
AND FA2.activity_number = 782 -- 'RETURNED'
WHERE FA1.activity_number = 781 -- 'WITH VENDOR'
ORDER BY FA1.invoice_number, FA1.event_number
The above includes a few additional assumptions:
event_number
is a more reliable sequence than activity_date
(since the latter could potentially have duplicates).activity_number
and activity_description
columns are a 1:1 match. As such, filtering was done using activity_number
.Your supplied sample data looks to already be the results of an earlier query or view. If desired, the above logic can be adapted to directly reference the underlying tables.
CTE preliminary results:
invoice number |
activity date |
activity number |
activity description |
system process |
event number |
Prior Activity Number |
RowNum |
---|---|---|---|---|---|---|---|
12 | 2017-10-16 | 781 | WITH VENDOR | system | 3 | null | 1 |
12 | 2018-05-29 | 782 | RETURNED | system | 6 | 781 | 2 |
12 | 2019-11-11 | 781 | WITH VENDOR | system | 8 | 782 | 3 |
12 | 2019-11-12 | 782 | RETURNED | system | 9 | 781 | 4 |
12 | 2020-10-08 | 781 | WITH VENDOR | system | 10 | 782 | 5 |
12 | 2022-07-16 | 782 | RETURNED | system | 11 | 781 | 6 |
Final results:
invoice number |
event number |
activity date |
activity description |
event number |
activity date |
activity description |
Elapsed Days |
---|---|---|---|---|---|---|---|
12 | 3 | 2017-10-16 | WITH VENDOR | 6 | 2018-05-29 | RETURNED | 225 |
12 | 8 | 2019-11-11 | WITH VENDOR | 9 | 2019-11-12 | RETURNED | 1 |
12 | 10 | 2020-10-08 | WITH VENDOR | 11 | 2022-07-16 | RETURNED | 646 |
See this db<>fiddle for a demo, including extra test data, including an anomalous leading "RETURN", an unmatched trailing "WITH VENDOR", plus a mix of repeated statuses.