Search code examples
sqlpostgresqlintervalsgaps-and-islands

How to get a date interval with condition


How to get a continuous date interval from rows fulfilling specific condition?

I have a table of employees states with 2 types of user_position. The interval is continuous if the next higher date_position per user_id has the same user_id, the next day value and user_position didn't change. The user cannot have different user positions in one day.

Have a feeling it requires several cases, window functions and tsrange, but can't quite get the right result.

I would be really grateful if you could help me.

Fiddle:

http://sqlfiddle.com/#!17/ba641/1/0

The result should look like this:

user_id user_position position_start position_end
1 1 01.01.2019 02.01.2019
1 2 03.01.2019 04.01.2019
1 1 05.01.2019 06.01.2019
2 1 01.01.2019 03.01.2019
2 2 04.01.2019 05.01.2019
2 2 08.01.2019 08.01.2019
2 2 10.01.2019 10.01.2019

Create/insert query for the source data:

CREATE TABLE IF NOT EXISTS users_position
(   id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id integer,
    user_position integer,
    date_position date);

INSERT INTO users_position (user_id,
                        user_position,
                        date_position)
VALUES
(1, 1, '2019-01-01'),
(1, 1, '2019-01-02'),
(1, 2, '2019-01-03'),
(1, 2, '2019-01-04'),
(1, 1, '2019-01-05'),
(1, 1, '2019-01-06'),
(2, 1, '2019-01-01'),
(2, 1, '2019-01-02'),
(2, 1, '2019-01-03'),
(2, 2, '2019-01-04'),
(2, 2, '2019-01-05'),
(2, 2, '2019-01-08'),
(2, 2, '2019-01-10');

Solution

  • SELECT user_id, user_position
         , min(date_position) AS position_start
         , max(date_position) AS position_end
    FROM  (
       SELECT user_id, user_position,date_position
            , count(*) FILTER (WHERE (date_position = last_date + 1
                                      AND user_position = last_pos) IS NOT TRUE)
                       OVER (PARTITION BY user_id ORDER BY date_position) AS interval
       FROM (
          SELECT user_id, user_position, date_position
               , lag(date_position) OVER w AS last_date
               , lag(user_position) OVER w AS last_pos
          FROM   users_position
          WINDOW w AS (PARTITION BY user_id ORDER BY date_position)
          ) sub1
       ) sub2
    GROUP  BY user_id, user_position, interval
    ORDER  BY user_id, interval;
    

    db<>fiddle here

    Basically, this forms intervals by counting the number of disruptions in continuity. Whenever the "next" row per user_id is not what's expected, a new interval starts.

    The WINDOW clause allows to specify a window frame once and use it repeatedly; no effect on performance.

    last_date + 1 works while last_date is type date. See:

    Related:

    About the aggregate FILTER: