I want to calculate session intervals for multiple, possibly overlapping sessions per agent. I have a log table like this in a PostgreSQL database:
id dt session_id agent_id state
7 2024-01-25 22:26:57 4 3148 0
7 2024-01-25 22:24:57 4 3148 1
6 2024-01-25 22:23:57 2 3148 0
5 2024-01-25 15:53:30 1 3148 0
4 2024-01-25 15:53:30 3 3148 0
3 2024-01-25 13:53:02 3 3148 1
2 2024-01-25 12:43:10 2 3148 1
1 2024-01-25 12:30:02 1 3148 1
state = 1
indicates the start of a session, state = 0
its end.
For single sessions I use the lag()
function:
select agent_id, status, dt, lag(dt) over (partition by (agent_id) order by dt) as pre_status_dt from
(
select *,
lag(status) over (partition by (agent_id) order by dt) as pre_status
from (
select cs.dt, cs.user_id as agent_id, cs.status
from "channel_subscribe" cs
where cs.channel = 14
) as t1
where t1.dt >= '2024-01-01'
) as t2
where t2.status != t2.pre_status
order by dt asc
But that does not work for overlapping sessions.
I would like to get this result:
agent_id start_dt end_dt
3148 2024-01-25 12:30:02 2024-01-25 22:23:57
3148 2024-01-25 22:24:57 2024-01-25 22:26:57
Your sample shows nested sessions: Additional sessions start and end within the first open session of the same agent. Alternatively, sessions might be chained: next session starts within the first session, but ends later.
The following solutions cover both cases: A combined session starts with the first open session, and ends once all open sessions have been closed.
Each assumes consistent data: Every session that gets closed has been opened before (or at the same time, but with am earlier id
). Unfinished sessions are reported as such (with null for end_dt
). If your data is not as reliable, you need to define what can go wrong, and handle cases accordingly.
SELECT agent_id
, min(dt) AS start_dt
, max(dt) FILTER (WHERE sum_state = 0) AS end_dt -- !
FROM (
SELECT *
, count(*) FILTER (WHERE sum_state = 0)
OVER (PARTITION BY agent_id ORDER BY dt DESC, id DESC) AS island
FROM (
SELECT *
, sum(CASE state WHEN 0 THEN -1 ELSE 1 END)
OVER (PARTITION BY agent_id ORDER BY dt, id) sum_state
FROM channel_subscribe
) sub1
) sub2
GROUP BY agent_id, island
ORDER BY agent_id, island DESC;
The query in sub1
adds a running count per agent as sum_state
. A combined session ends when sum_state
goes back to 0.
Subquery sub2
forms groups (island
) by counting "0"-events per agent. Note the descending sort (DESC
) to include each "0"-event in its group.
The outer query reports data as requested for every combined session. The added FILTER (WHERE sum_state = 0)
makes sure that unfinished sessions are reported as such.
Once we need multiple subquery levels (with changing sort order), it's probably faster to use a (combined) procedural solution like this PL/pgSQL function:
CREATE OR REPLACE FUNCTION f_combined_sessions()
RETURNS TABLE (agent_id int, start_dt timestamp, end_dt timestamp)
LANGUAGE plpgsql AS
$func$
DECLARE
r record;
_new_session bool;
_agent_id int;
BEGIN
FOR r IN
SELECT c.agent_id, c.dt
, sum(CASE c.state WHEN 0 THEN -1 ELSE 1 END)
OVER (PARTITION BY c.agent_id ORDER BY c.dt, c.id) AS sum_state
FROM channel_subscribe c
ORDER BY c.agent_id, c.dt, c.id
LOOP
IF agent_id = r.agent_id THEN -- same agent
IF _new_session = true THEN
start_dt := r.dt;
_new_session := false;
END IF;
ELSE -- new agent
-- return open session?
IF _new_session = false THEN
end_dt := null;
RETURN NEXT;
END IF;
agent_id := r.agent_id;
start_dt := r.dt;
_new_session := false;
END IF;
IF r.sum_state = 0 THEN
end_dt := r.dt;
RETURN NEXT;
_new_session := true;
END IF;
END LOOP;
-- return last open session?
IF _new_session = false THEN
end_dt := null;
RETURN NEXT;
END IF;
END
$func$;
Call:
SELECT * FROM f_combined_sessions();
Related: