Search code examples
sqlpostgresqldatetimegroup-bywindow-functions

I can't combine adjacent periods in a query


I work in PostgreSQL.

I have a table that stores info about clients and their status in different groups. I want a select query that will show this information like periods instead of event structure. So it will show client_id, moment_start, moment_end, status_group and status (boolean). Although I can use the lead(), I get results for a client where the next period with the same status is another record while I want to get periods where the status changes from row to row (by client_id and status_group). In reality, I use timestamp(6) instead of date.

What should I write in the query to achieve that?

Original table:

id client_id moment status_group status
1 1 2021-05-01 A TRUE
2 1 2021-05-05 A TRUE
3 1 2021-05-07 A FALSE
4 1 2021-06-10 A TRUE
5 2 2021-07-10 A TRUE
6 2 2021-05-10 B FALSE

Required table:

client_id moment_start moment_end status_group status
1 2021-05-01 2021-05-07 A TRUE
1 2021-05-07 2021-06-10 A FALSE
1 2021-06-10 NULL A TRUE
2 2021-07-10 NULL A TRUE
2 2021-05-10 NULL B FALSE

I tried:

SELECT
client_id,
moment AS moment_start
LEAD(moment) OVER (PARTITION BY client_id, status_group ORDER BY moment) AS moment_end,
status_group,
status
FROM table
ORDER BY client_id, moment_start

The table I got:

client_id moment_start moment_end status_group status
1 2021-05-01 2021-05-05 A TRUE
1 2021-05-05 2021-05-07 A TRUE
1 2021-05-07 2021-06-10 A FALSE
1 2021-06-10 NULL A TRUE
2 2021-07-10 NULL A TRUE
2 2021-05-10 NULL B FALSE

Solution

  • The following query demonstrates an approach to extract only status changes:

    WITH
        example_data (id, client_id, moment, status_group, status) AS (
            VALUES
                (1, 1, '2021-05-01'::TIMESTAMP, 'A', TRUE),
                (2, 1, '2021-05-05'::TIMESTAMP, 'A', TRUE),
                (3, 1, '2021-05-07'::TIMESTAMP, 'A', FALSE),
                (4, 1, '2021-06-10'::TIMESTAMP, 'A', TRUE),
                (5, 2, '2021-07-10'::TIMESTAMP, 'A', TRUE),
                (6, 2, '2021-05-10'::TIMESTAMP, 'B', FALSE)
        ),
        changes AS (
            SELECT
                d.id,
                d.client_id,
                d.moment,
                d.status_group,
                d.status,
                (d.status_group, d.status) IS DISTINCT FROM (LAG(d.status_group) OVER client_moment,
                                                             LAG(d.status) OVER client_moment) AS client_status_changed
            FROM
                example_data d
            WINDOW
                client_moment AS (PARTITION BY d.client_id, d.status_group ORDER BY d.moment)
        )
    SELECT
        c.client_id,
        c.moment,
        c.status_group,
        c.status
    FROM
        changes c
    WHERE
        c.client_status_changed
    ORDER BY
        c.client_id, c.moment;
    

    The following result is produced:

    client_id moment status_group status
    1 2021-05-01 00:00:00 A true
    1 2021-05-07 00:00:00 A false
    1 2021-06-10 00:00:00 A true
    2 2021-05-10 00:00:00 B false
    2 2021-07-10 00:00:00 A true

    The query uses a named window frame to reduce redundant code. The query can easily be modified to return include both a moment_start and a moment_end by using the LAG function in the main query.