Search code examples
sqlsql-serverwindow-functionsgaps-and-islands

How can I return the list of each partition suspicious transactions from each sender?


Here is the test:

Suspicious transactions are defined as:

  • A series of two or more transactions occur at intervals of an hour or less
  • They are from the same sender
  • The sum of transactions in a sequence is 150 or greater

The result should have the following columns: sender, sequence_start, sequence_end, transactions_count, transactions_sum

  • sender is the sender’s address.
  • sequence_start is the timestamp of the first transaction in the sequence.
  • sequence_end is the timestamp of the last transaction in the sequence.
  • transactions_count is the number of transactions in the sequence.
  • transactions_sum is the sum of transaction amounts in the sequence, to 6 places after the decimal.

Note: There are many sequences of suspicious transactions which are separated by many hours between them from one sender!!! CREATE TABLE #trans (sender int, dt datetime, amount numeric(19,2))

INSERT INTO #trans
VALUES  (1, '2023/05/23 07:11', 60)
,   (1, '2023/05/23 09:10', 100)
,   (1, '2023/05/23 09:11', 70)
,   (1, '2023/05/23 13:11', 35)
,   (1, '2023/05/23 16:12', 140)
,   (1, '2023/05/23 16:14', 200)
,   (2, '2023/05/23 01:14', 160)
,   (2, '2023/05/23 06:14', 160)
,   (2, '2023/05/23 07:11', 100)
,   (3, '2023/05/23 07:11', 100)
,   (3, '2023/05/23 17:11', 100)

So the result would be:

sender    sequence_start      sequence_end     transactions_count   transactions_sum 
1         2023/05/23 09:10     2023/05/23 09:11    2    170          

1         2023/05/23 16:12     2023/05/23 16:14    2    340

2         2023/05/23 06:14     2023/05/23 07:11    2    260

Solution

  • This is just a standard gaps and island problem, together with a sum per group check.

    CREATE TABLE #trans (sender int, dt datetime, amount numeric(19,2))
    
    INSERT INTO #trans
    VALUES  (1, '20230523 08:11', 10)
    ,   (1, '20230523 09:11', 100)
    ,   (1, '20230523 10:11', 35)
    ,   (1, '20230523 13:11', 35)
    ,   (2, '20230523 01:14', 160)
    ,   (2, '20230523 06:14', 160)
    ,   (2, '20230523 07:11', 100)
    ,   (2, '20230523 07:11', 100)
    ,   (2, '20230523 17:11', 100)
    
    SELECT  sender, MIN(dt), max(dt), SUM(amount)
    FROM    (
        
        SELECT  COUNT(grp) OVER(partition BY sender ORDER BY dt) AS grp2
        ,   *
        FROM    (
            SELECT  CASE WHEN datediff(minute, lag(dt) OVER(partition BY sender ORDER BY dt), dt) > 60 THEN 1 END AS grp
            ,   *
            FROM    #trans
            ) x
        ) x
    GROUP BY sender, grp2
    HAVING sum(amount) >= 150
    AND COUNT(*) > 1
    

    You follow the code from the inner select.

    First you create a flag (grp) that says if previous value is separate from current (ie. there's more than 60 minutes between the dates). I usually make sure the value is NULL if the group is the same, this will mean the count will not increase the group.

    Then by counting this flag in order, you create a group counter (grp2): 0, 1, 2 etc.

    Finally, you do a group by and make sure total amount is >= 150 as well as you get more than 1 transaction