I have a table holding customer invoice data. I am trying to find how many consequently months a particular transaction type is present on a customer’s invoice by creating a carryover counter. If the transaction is no longer present, the counter should reset to zero.
Table:
+------------+-------------+----------------+----------+
| Invoice_Id | Customer_id | Transaction_id | Sequence |
+------------+-------------+----------------+----------+
| 253442 | 23334 | | 1 |
| 253443 | 23334 | | 2 |
| 253444 | 23334 | | 3 |
| 253445 | 23334 | | 4 |
| 1050646 | 23334 | | 5 |
| 8457065 | 23334 | | 6 |
| 9052920 | 23334 | | 7 |
| 9333044 | 23334 | | 8 |
| 9616743 | 23334 | | 9 |
| 9894491 | 23334 | | 10 |
| 10186697 | 23334 | | 11 |
| 10490938 | 23334 | | 12 |
| 10803986 | 23334 | 69709477 | 13 |
| 11132317 | 23334 | 72103163 | 14 |
| 11444923 | 23334 | | 15 |
+------------+-------------+----------------+----------+
What I am trying to achieve:
+------------+-------------+----------------+----------+-----------+
| Invoice_Id | Customer_id | Transaction_id | Sequence | Carryover |
+------------+-------------+----------------+----------+-----------+
| 253442 | 23334 | | 1 | 0 |
| 253443 | 23334 | | 2 | 0 |
| 253444 | 23334 | | 3 | 0 |
| 253445 | 23334 | | 4 | 0 |
| 1050646 | 23334 | | 5 | 0 |
| 8457065 | 23334 | | 6 | 0 |
| 9052920 | 23334 | | 7 | 0 |
| 9333044 | 23334 | | 8 | 0 |
| 9616743 | 23334 | | 9 | 0 |
| 9894491 | 23334 | | 10 | 0 |
| 10186697 | 23334 | | 11 | 0 |
| 10490938 | 23334 | | 12 | 0 |
| 10803986 | 23334 | 69709477 | 13 | 1 |
| 11132317 | 23334 | 72103163 | 14 | 2 |
| 11444923 | 23334 | | 15 | 0 |
+------------+-------------+----------------+----------+-----------+
I assume I can use analytic functions / Oracle cte?
Thanks!
add:
Cumulative sum of transaction_count with reset when transaction_count = 0.
+------------+-------------+-------------------+----------+-----------+
| Invoice_Id | Customer_id | Transaction_Count | Sequence | Carryover |
+------------+-------------+-------------------+----------+-----------+
| 253442 | 23334 | 0 | 1 | 0 |
| 253443 | 23334 | 0 | 2 | 0 |
| 253444 | 23334 | 1 | 3 | 1 |
| 253445 | 23334 | 1 | 4 | 2 |
| 1050646 | 23334 | 0 | 5 | 0 |
| 8457065 | 23334 | 0 | 6 | 0 |
| 9052920 | 23334 | 2 | 7 | 2 |
| 9333044 | 23334 | 1 | 8 | 3 |
| 9616743 | 23334 | 0 | 9 | 0 |
| 9894491 | 23334 | 0 | 10 | 0 |
| 10186697 | 23334 | 0 | 11 | 0 |
| 10490938 | 23334 | 0 | 12 | 0 |
| 10803986 | 23334 | 1 | 13 | 1 |
| 11132317 | 23334 | 1 | 14 | 2 |
| 11444923 | 23334 | 0 | 15 | 0 |
+------------+-------------+-------------------+----------+-----------+
Assuming I understand your requirement correctly, here's one way, which uses the Tabibitosan method to "group" the data based on onsecutive null / not-null transaction_ids. Once we have that information, we can then do a conditional row_number() based on whether the transaction_id is null or not.
WITH sample_data AS (SELECT 253442 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 1 seq FROM dual UNION ALL
SELECT 253443 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 2 seq FROM dual UNION ALL
SELECT 253444 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 3 seq FROM dual UNION ALL
SELECT 253445 Invoice_Id, 23334 Customer_id, 123 Transaction_id, 4 seq FROM dual UNION ALL
SELECT 1050646 Invoice_Id, 23334 Customer_id, 456 Transaction_id, 5 seq FROM dual UNION ALL
SELECT 8457065 Invoice_Id, 23334 Customer_id, 789 Transaction_id, 6 seq FROM dual UNION ALL
SELECT 9052920 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 7 seq FROM dual UNION ALL
SELECT 9333044 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 8 seq FROM dual UNION ALL
SELECT 9616743 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 9 seq FROM dual UNION ALL
SELECT 9894491 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 10 seq FROM dual UNION ALL
SELECT 10186697 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 11 seq FROM dual UNION ALL
SELECT 10490938 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 12 seq FROM dual UNION ALL
SELECT 10803986 Invoice_Id, 23334 Customer_id, 69709477 Transaction_id, 13 seq FROM dual UNION ALL
SELECT 11132317 Invoice_Id, 23334 Customer_id, 72103163 Transaction_id, 14 seq FROM dual UNION ALL
SELECT 11444923 Invoice_Id, 23334 Customer_id, NULL Transaction_id, 15 seq FROM dual)
-- end of mimicking your data in a "table" called sample_data
-- you wouldn't need this - you'd just select from your table directly in the sql below:
SELECT invoice_id,
customer_id,
transaction_id,
seq,
CASE WHEN transaction_id is not NULL THEN
row_number() OVER (PARTITION BY customer_id, grp ORDER BY seq)
ELSE 0
END carryover
FROM (SELECT invoice_id,
customer_id,
transaction_id,
seq,
row_number() OVER (PARTITION BY customer_id ORDER BY seq)
- row_number() OVER (PARTITION BY customer_id, CASE WHEN transaction_id IS NULL THEN 0 ELSE 1 END ORDER BY seq) grp
FROM sample_data)
ORDER BY customer_id, seq;
INVOICE_ID CUSTOMER_ID TRANSACTION_ID SEQ CARRYOVER
---------- ----------- -------------- ---------- ----------
253442 23334 1 0
253443 23334 2 0
253444 23334 3 0
253445 23334 123 4 1
1050646 23334 456 5 2
8457065 23334 789 6 3
9052920 23334 7 0
9333044 23334 8 0
9616743 23334 9 0
9894491 23334 10 0
10186697 23334 11 0
10490938 23334 12 0
10803986 23334 69709477 13 1
11132317 23334 72103163 14 2
11444923 23334 15 0
For the additional requirement:
WITH sample_data AS (SELECT 253442 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 1 seq FROM dual UNION ALL
SELECT 253443 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 2 seq FROM dual UNION ALL
SELECT 253444 Invoice_Id, 23334 Customer_id, 1 Transaction_Count, 3 seq FROM dual UNION ALL
SELECT 253445 Invoice_Id, 23334 Customer_id, 1 Transaction_Count, 4 seq FROM dual UNION ALL
SELECT 1050646 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 5 seq FROM dual UNION ALL
SELECT 8457065 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 6 seq FROM dual UNION ALL
SELECT 9052920 Invoice_Id, 23334 Customer_id, 2 Transaction_Count, 7 seq FROM dual UNION ALL
SELECT 9333044 Invoice_Id, 23334 Customer_id, 1 Transaction_Count, 8 seq FROM dual UNION ALL
SELECT 9616743 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 9 seq FROM dual UNION ALL
SELECT 9894491 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 10 seq FROM dual UNION ALL
SELECT 10186697 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 11 seq FROM dual UNION ALL
SELECT 10490938 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 12 seq FROM dual UNION ALL
SELECT 10803986 Invoice_Id, 23334 Customer_id, 1 Transaction_Count, 13 seq FROM dual UNION ALL
SELECT 11132317 Invoice_Id, 23334 Customer_id, 1 Transaction_Count, 14 seq FROM dual UNION ALL
SELECT 11444923 Invoice_Id, 23334 Customer_id, 0 Transaction_Count, 15 seq FROM dual)
-- end of mimicking your data in a "table" called sample_data
-- you wouldn't need this - you'd just select from your table directly in the sql below:
SELECT invoice_id,
customer_id,
Transaction_Count,
seq,
SUM(transaction_count) OVER (PARTITION BY customer_id,
CASE WHEN Transaction_Count = 0 THEN 0 ELSE 1 END,
grp
ORDER BY seq) carryover
FROM (SELECT invoice_id,
customer_id,
Transaction_Count,
seq,
row_number() OVER (PARTITION BY customer_id
ORDER BY seq)
- row_number() OVER (PARTITION BY customer_id,
CASE WHEN Transaction_Count = 0 THEN 0 ELSE 1 END
ORDER BY seq) grp
FROM sample_data)
ORDER BY customer_id, seq;
INVOICE_ID CUSTOMER_ID TRANSACTION_COUNT SEQ CARRYOVER
---------- ----------- ----------------- ---------- ----------
253442 23334 0 1 0
253443 23334 0 2 0
253444 23334 1 3 1
253445 23334 1 4 2
1050646 23334 0 5 0
8457065 23334 0 6 0
9052920 23334 2 7 2
9333044 23334 1 8 3
9616743 23334 0 9 0
9894491 23334 0 10 0
10186697 23334 0 11 0
10490938 23334 0 12 0
10803986 23334 1 13 1
11132317 23334 1 14 2
11444923 23334 0 15 0
It uses a very similar concept to the original solution, except by adding the check for zero or non-zero transaction_count into the partition by clause of the final sum() analytic function, we no longer need the case statement to output 0 or the sum.
Hopefully you can tell what tweaks I had to make - basically, the checks for transaction_id is null/not null had to be changed to transaction_count = 0/!= 0, plus the change of row_number()
to sum(transaction_count)
plus the aforementioned change to the partition by clause. I'm sure if you'd have thought about it some more, you'd have come to the same conclusion, if you haven't already! *{:-)