timestamp | sender |
---|---|
22-05-2022 16:19 | Ram |
22-05-2022 16:59 | Ram |
--------- | ------ |
04-06-2022 18:19 | Tom |
04-06-2022 18:30 | Tom |
04-06-2022 19:20 | Tom |
04-06-2022 19:40 | Tom |
04-06-2022 19:50 | Tom |
04-06-2022 19:55 | Tom |
-------------- | ------ |
04-06-2022 21:30 | Tom |
04-06-2022 21:35 | Tom |
04-06-2022 21:37 | Tom |
---------------- | ------ |
04-06-2022 22:41 | Tom |
04-06-2022 22:45 | Tom |
---------------- | ------ |
04-06-2022 23:47 | Tom |
04-06-2022 23:49 | Tom |
04-06-2022 23:50 | Tom |
04-06-2022 23:55 | Tom |
---------------- | ------ |
08-06-2022 08:30 | Tom |
08-06-2022 09:15 | Tom |
08-06-2022 09:45 | Tom |
08-06-2022 09:55 | Tom |
-------------------- | ------ |
10-06-2022 09:19 | John |
10-06-2022 09:21 | John |
---------------- | ------ |
08-08-2022 18:19 | Harry |
08-08-2022 18:32 | Harry |
---------------- | ------ |
I have table of timestamps and sender. All consecutive timestamps by the same sender within 60 minutes are considered to be from the same branch like Ram at 16:19 and 16.59 Tom between 18:19 and 19:55.I've delimited the timestamps at same branch by "-------" in the table.
How to write a SQL query so that I can print the start and end of transactions at same branch? I've written the following SQL query using SQLite. I use juliandate to find the date difference. I'm allowed to use only SQL and window functions and CTE not any vendor specific extensions. Except for date part for diff the code should work on MySQL, SQLite, Oracle, SQL Server and DB2.
Can someone tell me how to solve this query in a better and elegant way ?
--- get the next row as a column
with nxttab as (
select
timestamp,LEAD(timestamp,1) over(PARTITION by sender order by timestamp) as nxttimestamp,sender FROM Transactions
),
--- get all Transactions within 60 minutes
difftab as (
select timestamp, nxttimestamp,sender FROM nxttab where (julianday(nxttimestamp) - julianday(timestamp))*24*60 < 60
),
--- get the difference of current transaction - previous next transaction
endpoints as (
select julianday(timestamp) - julianday(LAG(nxttimestamp,1) OVER ( PARTITION BY sender order by timestamp )) as diffnextday,* from difftab
),
--- Mark start of new transaction with 1 and all consecutive transactions with 0
intervals as (
select CASE WHEN diffnextday!=0 THEN 1 ELSE 0 END as startnew ,* from endpoints
),
--- Get the partition number of each transaction by cummulative sum from top to current row
partitions as (
select SUM(startnew) OVER (PARTITION BY Sender ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as partno,* from intervals
)
-- group by sender,partition number find min,max and count(*)+1
select sender,partno,min(timestamp) as start,max(nxttimestamp) as end ,count(*)+1 as count from partitions
group by sender,partno order by sender;
You can use the following query
WITH PrevValues AS (
-- get previous and next timestamps
SELECT *,
prevTime = LAG(timestamp) OVER (PARTITION BY sender ORDER BY timestamp),
nextTime = LEAD(timestamp) OVER (PARTITION BY sender ORDER BY timestamp)
FROM Transactions t
),
Filtered AS (
-- filter to only include the beginning and end of each section
SELECT *,
-- calculate a new previous time, for the beginning of each section
newPrevTime = LAG(timestamp) OVER (PARTITION BY sender ORDER BY timestamp)
FROM PrevValues
-- is start
WHERE prevTime IS NULL
OR DATEADD(hour, 1, prevTime) < timestamp
-- or is end
OR nextTime IS NULL
OR DATEADD(hour, -1, nextTime) >= timestamp
)
SELECT
sender,
StartTime = newPrevTime,
EndTime = timestamp
FROM Filtered
-- only include the end rows
WHERE nextTime IS NULL
OR DATEADD(hour, -1, nextTime) >= timestamp;