Search code examples
sqloracleteradatasql

Group customer transactions based on the subsequent transaction


Small change in the requirement

Needed Teradata SQL query to create a grouping from the first transaction to its subsequent transactions. If a subsequent transaction is made within 30 days of the previous transaction, consider them all to belongs to the same group. Break the group and start a new one if any transaction is made more than 30 days after the previous transaction.

data:

trans_Id customer_id trans_date
001 1101 2020-11-02
002 1101 2020-11-14
003 1101 2020-11-18
004 1101 2021-12-04
005 1101 2021-01-06
006 1101 2021-01-08
007 1101 2021-02-17
008 1101 2021-03-01
009 1101 2021-03-04
010 1102 2021-03-02
011 1102 2021-03-08
012 1102 2021-04-01
013 1102 2021-04-02
014 1102 2021-04-12
015 1102 2021-04-29
016 1102 2021-06-10
017 1102 2021-06-12

Expected result (expecting grouping like below). from 001 till 004 fall under within 30 day to its previous transaction. 005 has elapsed 30 day from it's previous transaction so new group has to be started.

trans_Id customer_id trans_date Group
001 1101 2020-11-02 1
002 1101 2020-11-14 1
003 1101 2020-11-18 1
004 1101 2021-12-04 1
005 1101 2021-01-06 2
006 1101 2021-01-08 2
007 1101 2021-02-17 3
008 1101 2021-03-01 3
009 1101 2021-03-04 3
010 1102 2021-03-02 1
011 1102 2021-03-08 1
012 1102 2021-04-01 2
013 1102 2021-04-02 2
014 1102 2021-04-12 2
015 1102 2021-04-29 2
016 1102 2021-06-10 3
017 1102 2021-06-12 3

Solution

  • From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching:

    SELECT *
    FROM   table_name
    MATCH_RECOGNIZE(
      PARTITION BY customer_id
      ORDER BY trans_date
      MEASURES
        MATCH_NUMBER() AS grp
      ALL ROWS PER MATCH
      PATTERN ( within_30_days+ )
      DEFINE
        within_30_days AS trans_date <= FIRST(trans_date) + INTERVAL '30' DAY
    )
    

    Which, for the sample data:

    CREATE TABLE table_name (trans_id, customer_id, trans_date) AS
      SELECT '001', '1101',  DATE '2020-11-02' FROM DUAL UNION ALL
      SELECT '002', '1101',  DATE '2020-11-14' FROM DUAL UNION ALL
      SELECT '003', '1101',  DATE '2020-11-18' FROM DUAL UNION ALL
      SELECT '004', '1101',  DATE '2021-12-04' FROM DUAL UNION ALL
      SELECT '005', '1101',  DATE '2021-12-05' FROM DUAL UNION ALL
      SELECT '006', '1101',  DATE '2021-12-08' FROM DUAL UNION ALL
      SELECT '007', '1101',  DATE '2021-01-17' FROM DUAL UNION ALL
      SELECT '008', '1101',  DATE '2021-05-01' FROM DUAL UNION ALL
      SELECT '009', '1101',  DATE '2021-05-04' FROM DUAL UNION ALL
      SELECT '010', '1102',  DATE '2021-03-02' FROM DUAL UNION ALL
      SELECT '011', '1102',  DATE '2021-03-08' FROM DUAL UNION ALL
      SELECT '012', '1102',  DATE '2021-04-01' FROM DUAL UNION ALL
      SELECT '013', '1102',  DATE '2021-04-02' FROM DUAL UNION ALL
      SELECT '014', '1102',  DATE '2021-04-12' FROM DUAL UNION ALL
      SELECT '015', '1102',  DATE '2021-04-29' FROM DUAL UNION ALL
      SELECT '016', '1102',  DATE '2021-06-10' FROM DUAL UNION ALL
      SELECT '017', '1102',  DATE '2021-06-12' FROM DUAL;
    

    Outputs:

    CUSTOMER_ID TRANS_DATE GRP TRANS_ID
    1101 2020-11-02 00:00:00 1 001
    1101 2020-11-14 00:00:00 1 002
    1101 2020-11-18 00:00:00 1 003
    1101 2021-01-17 00:00:00 2 007
    1101 2021-05-01 00:00:00 3 008
    1101 2021-05-04 00:00:00 3 009
    1101 2021-12-04 00:00:00 4 004
    1101 2021-12-05 00:00:00 4 005
    1101 2021-12-08 00:00:00 4 006
    1102 2021-03-02 00:00:00 1 010
    1102 2021-03-08 00:00:00 1 011
    1102 2021-04-01 00:00:00 1 012
    1102 2021-04-02 00:00:00 2 013
    1102 2021-04-12 00:00:00 2 014
    1102 2021-04-29 00:00:00 2 015
    1102 2021-06-10 00:00:00 3 016
    1102 2021-06-12 00:00:00 3 017

    fiddle