Situation
We have a PostgreSQL 9.1 database containing user sessions with login date/time and logout date/time per row. Table looks like this:
user_id | login_ts | logout_ts
------------+--------------+--------------------------------
USER1 | 2021-02-03 09:23:00 | 2021-02-03 11:44:00
USER2 | 2021-02-03 10:49:00 | 2021-02-03 13:30:00
USER3 | 2021-02-03 13:32:00 | 2021-02-03 15:31:00
USER4 | 2021-02-04 13:50:00 | 2021-02-04 14:53:00
USER5 | 2021-02-04 14:44:00 | 2021-02-04 15:21:00
USER6 | 2021-02-04 14:52:00 | 2021-02-04 17:59:00
Goal
Would like to get the max number of concurrent users for each 24 hours of each day in the time range. Like this:
date | hour | sessions
-----------+-------+-----------
2021-02-03 | 01:00 | 0
2021-02-03 | 02:00 | 0
2021-02-03 | 03:00 | 0
2021-02-03 | 04:00 | 0
2021-02-03 | 05:00 | 0
2021-02-03 | 06:00 | 0
2021-02-03 | 07:00 | 0
2021-02-03 | 08:00 | 0
2021-02-03 | 09:00 | 1
2021-02-03 | 10:00 | 2
2021-02-03 | 11:00 | 2
2021-02-03 | 12:00 | 1
2021-02-03 | 13:00 | 1
2021-02-03 | 14:00 | 1
2021-02-03 | 15:00 | 0
2021-02-03 | 16:00 | 0
2021-02-03 | 17:00 | 0
2021-02-03 | 18:00 | 0
2021-02-03 | 19:00 | 0
2021-02-03 | 20:00 | 0
2021-02-03 | 21:00 | 0
2021-02-03 | 22:00 | 0
2021-02-03 | 23:00 | 0
2021-02-03 | 24:00 | 0
2021-02-04 | 01:00 | 0
2021-02-04 | 02:00 | 0
2021-02-04 | 03:00 | 0
2021-02-04 | 04:00 | 0
2021-02-04 | 05:00 | 0
2021-02-04 | 06:00 | 0
2021-02-04 | 07:00 | 0
2021-02-04 | 08:00 | 0
2021-02-04 | 09:00 | 0
2021-02-04 | 10:00 | 0
2021-02-04 | 11:00 | 0
2021-02-04 | 12:00 | 0
2021-02-04 | 13:00 | 1
2021-02-04 | 14:00 | 3
2021-02-04 | 15:00 | 1
2021-02-04 | 16:00 | 1
2021-02-04 | 17:00 | 1
2021-02-04 | 18:00 | 0
2021-02-04 | 19:00 | 0
2021-02-04 | 20:00 | 0
2021-02-04 | 21:00 | 0
2021-02-04 | 22:00 | 0
2021-02-04 | 23:00 | 0
2021-02-04 | 24:00 | 0
Considerations
Similar questions
A similar question was answered here: Count max. number of concurrent user sessions per day by Erwin Brandstetter. However, this is per day rather than per hour, and I am apparently too much of a noob at postgreSQL to be able to translate it into hourly so I'm hoping someone can help.
I would decompose this into two problems:
Note two things:
'2014-04-03 17:59:00'
is a typo.First, calculate the overlaps. For this, unpivot the logins and logout. Put in a counter of +1
for logins and -1
for logouts and do a cumulative sum. This looks like:
with overlap as (
select v.ts, sum(v.inc) as inc,
sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
lead(v.ts) over (order by v.ts) as next_ts
from sessions s cross join lateral
(values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
group by v.ts
)
select *
from overlap
order by ts;
For the next step, use generate_series()
to generate timestamps one hour apart. Look for the maximum value during that period using left join
and group by
:
with overlap as (
select v.ts, sum(v.inc) as inc,
sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
lead(v.ts) over (order by v.ts) as next_ts
from sessions s cross join lateral
(values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
group by v.ts
)
select gs.hh, coalesce(max(o.num_overlaps), 0) as num_overlaps
from generate_series('2021-02-03'::date, '2021-02-05'::date, interval '1 hour') gs(hh) left join
overlap o
on o.ts < gs.hh + interval '1 hour' and
o.next_ts > gs.hh
group by gs.hh
order by gs.hh;
Here is a db<>fiddle using your data fixed with the a reasonable logout time for the last record.