Search code examples
sqlgaps-and-islands

Transactions within an hour


timestamp sender
22-05-2022 16:19 Ram
22-05-2022 16:59 Ram
--------- ------
04-06-2022 18:19 Tom
04-06-2022 18:30 Tom
04-06-2022 19:20 Tom
04-06-2022 19:40 Tom
04-06-2022 19:50 Tom
04-06-2022 19:55 Tom
-------------- ------
04-06-2022 21:30 Tom
04-06-2022 21:35 Tom
04-06-2022 21:37 Tom
---------------- ------
04-06-2022 22:41 Tom
04-06-2022 22:45 Tom
---------------- ------
04-06-2022 23:47 Tom
04-06-2022 23:49 Tom
04-06-2022 23:50 Tom
04-06-2022 23:55 Tom
---------------- ------
08-06-2022 08:30 Tom
08-06-2022 09:15 Tom
08-06-2022 09:45 Tom
08-06-2022 09:55 Tom
-------------------- ------
10-06-2022 09:19 John
10-06-2022 09:21 John
---------------- ------
08-08-2022 18:19 Harry
08-08-2022 18:32 Harry
---------------- ------

I have table of timestamps and sender. All consecutive timestamps by the same sender within 60 minutes are considered to be from the same branch like Ram at 16:19 and 16.59 Tom between 18:19 and 19:55.I've delimited the timestamps at same branch by "-------" in the table.

How to write a SQL query so that I can print the start and end of transactions at same branch? I've written the following SQL query using SQLite. I use juliandate to find the date difference. I'm allowed to use only SQL and window functions and CTE not any vendor specific extensions. Except for date part for diff the code should work on MySQL, SQLite, Oracle, SQL Server and DB2.

Can someone tell me how to solve this query in a better and elegant way ?
--- get the next row as a column
with nxttab as (
    select 
    timestamp,LEAD(timestamp,1) over(PARTITION by sender order by timestamp) as nxttimestamp,sender FROM Transactions
), 
--- get all Transactions within 60 minutes
difftab as (
    select timestamp, nxttimestamp,sender FROM  nxttab where (julianday(nxttimestamp) - julianday(timestamp))*24*60 < 60
), 
---  get the difference of current transaction - previous next transaction
endpoints as (
    select  julianday(timestamp) - julianday(LAG(nxttimestamp,1) OVER ( PARTITION BY sender order by timestamp )) as diffnextday,* from difftab
),
--- Mark start of new transaction with 1 and all consecutive transactions with 0
 intervals as (
    select CASE WHEN diffnextday!=0 THEN 1 ELSE 0 END as startnew ,* from endpoints
),
--- Get the partition number of each transaction by cummulative sum from top to current row
 partitions as (
select SUM(startnew) OVER (PARTITION BY Sender ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as partno,* from intervals
)
-- group by sender,partition number find min,max and count(*)+1
select sender,partno,min(timestamp) as start,max(nxttimestamp) as end ,count(*)+1 as count from partitions
group by sender,partno order by sender;

Solution

  • You can use the following query

    WITH PrevValues AS (
    -- get previous and next timestamps
        SELECT *,
          prevTime =  LAG(timestamp) OVER (PARTITION BY sender ORDER BY timestamp),
          nextTime = LEAD(timestamp) OVER (PARTITION BY sender ORDER BY timestamp)
        FROM Transactions t
    ),
    Filtered AS (
    -- filter to only include the beginning and end of each section
        SELECT *,
    -- calculate a new previous time, for the beginning of each section
          newPrevTime = LAG(timestamp) OVER (PARTITION BY sender ORDER BY timestamp)
        FROM PrevValues
           -- is start
        WHERE prevTime IS NULL
           OR DATEADD(hour, 1, prevTime) < timestamp
           -- or is end
           OR nextTime IS NULL
           OR DATEADD(hour, -1, nextTime) >= timestamp
    )
    SELECT
      sender,
      StartTime = newPrevTime,
      EndTime = timestamp
    FROM Filtered
    -- only include the end rows
    WHERE nextTime IS NULL
       OR DATEADD(hour, -1, nextTime) >= timestamp;
    

    db<>fiddle