I have a transactions table with columns user_id, transaction_timestamp, transaction_currency. Assume that home currency is 'ABC' and hence domestic transactions are all denoted by transaction_currency = 'ABC'. I want to define another column to the table denoting the trip number of each use based on the progression of transaction_currency. Given the ordered (by timestamp) list of transaction for a specific user, a new trip starts at each transaction where transaction_currency is not 'ABC' and the immediately previous transaction had transaction_currency = 'ABC'. Once a trip starts, all transactions are part of the same trip number until the immediately next transaction has transaction_currency = 'ABC'. Again, the very next time another currency appears (other than 'ABC'), a new trip number is assigned to all transactions until it is ended by another transaction having transaction_currency = 'ABC'. For all transactions with transaction_currency = 'ABC', trip number can be defaulted to 0. For others, trip number starts from 1 and increments by 1 for every new trip.
Here is the expected result for a sample dataset
USER | TXN_TIMESTAMP | TRANSACTION_CURRENCY | TRIP_NUMBER |
---|---|---|---|
1 | 1 | ABC | 0 |
1 | 2 | AED | 1 |
1 | 3 | AED | 1 |
1 | 4 | AED | 1 |
1 | 5 | ABC | 0 |
1 | 6 | ABC | 0 |
1 | 7 | AED | 2 |
1 | 8 | AED | 2 |
1 | 9 | AED | 2 |
1 | 10 | AED | 2 |
1 | 11 | ABC | 0 |
1 | 12 | ABC | 0 |
1 | 13 | ABC | 0 |
1 | 14 | THB | 3 |
1 | 15 | THB | 3 |
1 | 16 | THB | 3 |
1 | 17 | USD | 3 (trip num should not change here) |
1 | 18 | USD | 3 |
1 | 19 | ABC | 0 |
1 | 20 | AED | 5 |
I know this can be done by window functions, but I am a beginner.
I tried the following
WITH TripStarts AS (
SELECT
user_id,
transaction_timestamp,
transaction_currency,
LAG(transaction_currency, 1, 'ABC') OVER (PARTITION BY user_id ORDER BY transaction_timestamp) AS prev_currency,
FROM
transactions
)
SELECT
*,
CASE
WHEN transaction_currency = 'ABC' THEN 0
WHEN transaction_currency != 'ABC' AND prev_currency = 'ABC' THEN ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_timestamp)
END AS trip_number
FROM TripStarts
But the trip_number keeps getting incremented for each transaction in the same trip.
Awesome if this can be done by commonly used window functions.
This is a gaps problem, i usually do a two step counter + sum solution for that:
select *
, case when transaction_currency = 'abc' then 0 else SUM(flag) over(partition by [user] order by txn_timestamp rows between unbounded preceding and current row) end as trip_number_new
from (
select *, case when lag(transaction_currency) over(partition by [user] order by txn_timestamp) = 'ABC' and transaction_currency <> 'abc' then 1 else 0 end as flag
from (
VALUES (1, 1, N'ABC', N'0')
, (1, 2, N'AED', N'1')
, (1, 3, N'AED', N'1')
, (1, 4, N'AED', N'1')
, (1, 5, N'ABC', N'0')
, (1, 6, N'ABC', N'0')
, (1, 7, N'AED', N'2')
, (1, 8, N'AED', N'2')
, (1, 9, N'AED', N'2')
, (1, 10, N'AED', N'2')
, (1, 11, N'ABC', N'0')
, (1, 12, N'ABC', N'0')
, (1, 13, N'ABC', N'0')
, (1, 14, N'THB', N'3')
, (1, 15, N'THB', N'3')
, (1, 16, N'THB', N'3')
, (1, 17, N'USD', N'3 (trip num should not change here)')
, (1, 18, N'USD', N'3')
, (1, 19, N'ABC', N'0')
, (1, 20, N'AED', N'5')
) t ([USER],TXN_TIMESTAMP,TRANSACTION_CURRENCY,TRIP_NUMBER)
) x
Step 1:
Create a flag that changes when you want to get a new trip number. In this case, we want it to happen when previous trip was ABC and current trip isn't ABC: case when lag(transaction_currency) over(partition by [user] order by txn_timestamp) = 'ABC' and transaction_currency <> 'abc' then 1 else 0 end
Step 2:
Summarize the flags over ordered by timestamp to get the trip number. For ABC, we set trip number to zero:
case when transaction_currency = 'abc' then 0 else SUM(flag) over(partition by [user] order by txn_timestamp rows between unbounded preceding and current row) end
Output:
USER | TXN_TIMESTAMP | TRANSACTION_CURRENCY | TRIP_NUMBER | flag | trip_number_new |
---|---|---|---|---|---|
1 | 1 | ABC | 0 | 0 | 0 |
1 | 2 | AED | 1 | 1 | 1 |
1 | 3 | AED | 1 | 0 | 1 |
1 | 4 | AED | 1 | 0 | 1 |
1 | 5 | ABC | 0 | 0 | 0 |
1 | 6 | ABC | 0 | 0 | 0 |
1 | 7 | AED | 2 | 1 | 2 |
1 | 8 | AED | 2 | 0 | 2 |
1 | 9 | AED | 2 | 0 | 2 |
1 | 10 | AED | 2 | 0 | 2 |
1 | 11 | ABC | 0 | 0 | 0 |
1 | 12 | ABC | 0 | 0 | 0 |
1 | 13 | ABC | 0 | 0 | 0 |
1 | 14 | THB | 3 | 1 | 3 |
1 | 15 | THB | 3 | 0 | 3 |
1 | 16 | THB | 3 | 0 | 3 |
1 | 17 | USD | 3 (trip num should not change here) | 0 | 3 |
1 | 18 | USD | 3 | 0 | 3 |
1 | 19 | ABC | 0 | 0 | 0 |
1 | 20 | AED | 5 | 1 | 4 |
Note, that output doesn't match on the last trip, but i think that's a mistake in your data