Search code examples
sqlpostgresqldatewindow-functionsgaps-and-islands

how to get non-overlapping distinct intervals in postgresql table?


the table has user sessions, i need distinct non overlapping sessions printed.

CREATE TABLE SESSIONS(
            id serial NOT NULL PRIMARY KEY, 
            ctn INT NOT NULL, 
            day DATE NOT NULL,
            f_time TIME(0) NOT NULL,
            l_time TIME(0) NOT  NULL
        );     
    INSERT INTO SESSIONS(id, ctn, day, f_time, l_time)
    VALUES
    (1, 707, '2019-06-18', '10:48:25', '10:56:17'),
    (2, 707, '2019-06-18', '10:48:33', '10:56:17'),
    (3, 707, '2019-06-18', '10:53:17', '11:00:49'),
    (4, 707, '2019-06-18', '10:54:31', '10:57:37'),
    (5, 707, '2019-06-18', '11:03:59', '11:10:39'),
    (6, 707, '2019-06-18', '11:04:41', '11:08:02'),
    (7, 707, '2019-06-18', '11:11:04', '11:19:39');

sql fiddle

My table looks like this:

id  ctn day         f_time      l_time
1   707 2019-06-18  10:48:25    10:56:17
2   707 2019-06-18  10:48:33    10:56:17
3   707 2019-06-18  10:53:17    11:00:49
4   707 2019-06-18  10:54:31    10:57:37
5   707 2019-06-18  11:03:59    11:10:39
6   707 2019-06-18  11:04:41    11:08:02
7   707 2019-06-18  11:11:04    11:19:39

Now i need distinct non-overlapping user sessions, so it should give me

1.  start_time: 10:48:25  end_time: 11:00:49  duration: 12min,24 sec
2.  start_time: 11:03:59  end_time: 11:10:39  duration: 6min,40 sec
3.  start_time: 11:11:04  end_time: 11:19:33  duration: 8min,29 sec

Solution

  • This is a gaps-and-islands problem. Here is an approach using window functions:

    select
        ctn,
        min(f_ts) start_ts,
        max(l_ts) end_ts,
        max(l_ts) - min(f_ts) duration
    from (
        select 
            s.*,
            count(*) filter(where f_ts > lag_l_ts) over(partition by ctn order by f_ts) grp
        from (
            select
                s.*,
                lag(l_ts) over(partition by ctn order by f_ts) lag_l_ts
            from (
                select
                    s.*,
                    (day + l_time)::timestamp l_ts,
                    (day + f_time)::timestamp f_ts
                from sessions s
            ) s
        ) s
    ) s
    group by ctn, grp
    order by ctn, start_ts
    

    The query works as follows:

    • first we rebuild proper timestamps from the date and time parts: storing data this way makes it uneasy to manipulate it (and it does not allow sessions to spread over different days)

    • one data is normalized, we use lag() to get the end timestamp of the "previous" row

    • we can then build groups of "adjacent" records: every time the start timestamp is greater than the previous end timestamp, a new group starts

    • the final step is aggregation

    Demo on DB Fiddle:

    ctn | start_ts            | end_ts              | duration
    --: | :------------------ | :------------------ | :-------
    707 | 2019-06-18 10:48:25 | 2019-06-18 11:00:49 | 00:12:24
    707 | 2019-06-18 11:03:59 | 2019-06-18 11:10:39 | 00:06:40
    707 | 2019-06-18 11:11:04 | 2019-06-18 11:19:39 | 00:08:35