How to get a continuous date interval from rows fulfilling specific condition?
I have a table of employees states with 2 types of user_position. The interval is continuous if the next higher date_position per user_id has the same user_id, the next day value and user_position didn't change. The user cannot have different user positions in one day.
Have a feeling it requires several cases, window functions and tsrange, but can't quite get the right result.
I would be really grateful if you could help me.
Fiddle:
http://sqlfiddle.com/#!17/ba641/1/0
The result should look like this:
user_id | user_position | position_start | position_end |
---|---|---|---|
1 | 1 | 01.01.2019 | 02.01.2019 |
1 | 2 | 03.01.2019 | 04.01.2019 |
1 | 1 | 05.01.2019 | 06.01.2019 |
2 | 1 | 01.01.2019 | 03.01.2019 |
2 | 2 | 04.01.2019 | 05.01.2019 |
2 | 2 | 08.01.2019 | 08.01.2019 |
2 | 2 | 10.01.2019 | 10.01.2019 |
Create/insert query for the source data:
CREATE TABLE IF NOT EXISTS users_position
( id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id integer,
user_position integer,
date_position date);
INSERT INTO users_position (user_id,
user_position,
date_position)
VALUES
(1, 1, '2019-01-01'),
(1, 1, '2019-01-02'),
(1, 2, '2019-01-03'),
(1, 2, '2019-01-04'),
(1, 1, '2019-01-05'),
(1, 1, '2019-01-06'),
(2, 1, '2019-01-01'),
(2, 1, '2019-01-02'),
(2, 1, '2019-01-03'),
(2, 2, '2019-01-04'),
(2, 2, '2019-01-05'),
(2, 2, '2019-01-08'),
(2, 2, '2019-01-10');
SELECT user_id, user_position
, min(date_position) AS position_start
, max(date_position) AS position_end
FROM (
SELECT user_id, user_position,date_position
, count(*) FILTER (WHERE (date_position = last_date + 1
AND user_position = last_pos) IS NOT TRUE)
OVER (PARTITION BY user_id ORDER BY date_position) AS interval
FROM (
SELECT user_id, user_position, date_position
, lag(date_position) OVER w AS last_date
, lag(user_position) OVER w AS last_pos
FROM users_position
WINDOW w AS (PARTITION BY user_id ORDER BY date_position)
) sub1
) sub2
GROUP BY user_id, user_position, interval
ORDER BY user_id, interval;
db<>fiddle here
Basically, this forms intervals by counting the number of disruptions in continuity. Whenever the "next" row per user_id
is not what's expected, a new interval starts.
The WINDOW
clause allows to specify a window frame once and use it repeatedly; no effect on performance.
last_date + 1
works while last_date
is type date
. See:
Related:
About the aggregate FILTER
: