I work in PostgreSQL.
I have a table that stores info about clients and their status in different groups. I want a select query that will show this information like periods instead of event structure. So it will show client_id, moment_start, moment_end, status_group and status (boolean). Although I can use the lead(), I get results for a client where the next period with the same status is another record while I want to get periods where the status changes from row to row (by client_id and status_group). In reality, I use timestamp(6) instead of date.
What should I write in the query to achieve that?
Original table:
id | client_id | moment | status_group | status |
---|---|---|---|---|
1 | 1 | 2021-05-01 | A | TRUE |
2 | 1 | 2021-05-05 | A | TRUE |
3 | 1 | 2021-05-07 | A | FALSE |
4 | 1 | 2021-06-10 | A | TRUE |
5 | 2 | 2021-07-10 | A | TRUE |
6 | 2 | 2021-05-10 | B | FALSE |
Required table:
client_id | moment_start | moment_end | status_group | status |
---|---|---|---|---|
1 | 2021-05-01 | 2021-05-07 | A | TRUE |
1 | 2021-05-07 | 2021-06-10 | A | FALSE |
1 | 2021-06-10 | NULL | A | TRUE |
2 | 2021-07-10 | NULL | A | TRUE |
2 | 2021-05-10 | NULL | B | FALSE |
I tried:
SELECT
client_id,
moment AS moment_start
LEAD(moment) OVER (PARTITION BY client_id, status_group ORDER BY moment) AS moment_end,
status_group,
status
FROM table
ORDER BY client_id, moment_start
The table I got:
client_id | moment_start | moment_end | status_group | status |
---|---|---|---|---|
1 | 2021-05-01 | 2021-05-05 | A | TRUE |
1 | 2021-05-05 | 2021-05-07 | A | TRUE |
1 | 2021-05-07 | 2021-06-10 | A | FALSE |
1 | 2021-06-10 | NULL | A | TRUE |
2 | 2021-07-10 | NULL | A | TRUE |
2 | 2021-05-10 | NULL | B | FALSE |
The following query demonstrates an approach to extract only status changes:
WITH
example_data (id, client_id, moment, status_group, status) AS (
VALUES
(1, 1, '2021-05-01'::TIMESTAMP, 'A', TRUE),
(2, 1, '2021-05-05'::TIMESTAMP, 'A', TRUE),
(3, 1, '2021-05-07'::TIMESTAMP, 'A', FALSE),
(4, 1, '2021-06-10'::TIMESTAMP, 'A', TRUE),
(5, 2, '2021-07-10'::TIMESTAMP, 'A', TRUE),
(6, 2, '2021-05-10'::TIMESTAMP, 'B', FALSE)
),
changes AS (
SELECT
d.id,
d.client_id,
d.moment,
d.status_group,
d.status,
(d.status_group, d.status) IS DISTINCT FROM (LAG(d.status_group) OVER client_moment,
LAG(d.status) OVER client_moment) AS client_status_changed
FROM
example_data d
WINDOW
client_moment AS (PARTITION BY d.client_id, d.status_group ORDER BY d.moment)
)
SELECT
c.client_id,
c.moment,
c.status_group,
c.status
FROM
changes c
WHERE
c.client_status_changed
ORDER BY
c.client_id, c.moment;
The following result is produced:
client_id | moment | status_group | status |
---|---|---|---|
1 | 2021-05-01 00:00:00 | A | true |
1 | 2021-05-07 00:00:00 | A | false |
1 | 2021-06-10 00:00:00 | A | true |
2 | 2021-05-10 00:00:00 | B | false |
2 | 2021-07-10 00:00:00 | A | true |
The query uses a named window frame to reduce redundant code. The query can easily be modified to return include both a moment_start
and a moment_end
by using the LAG
function in the main query.