Search code examples
sqlpostgresqlwindow-functions

How to get lead value of a specific case in Postgresql?


I have a table like below

CREATE TABLE events (
  id SERIAL PRIMARY KEY,
  group_id INT,
  code VARCHAR,
  created_date timestamptz
);

Sample values

INSERT INTO events (group_id, code, created_date) VALUES 
(1, '0001', clock_timestamp()),
(1, '000A', clock_timestamp()),
(1, '0002', clock_timestamp()),
(2, '000A', clock_timestamp()),
(2, '0003', clock_timestamp()),
(2, '0004', clock_timestamp()),
(3, '0001', clock_timestamp()),
(3, '000A', clock_timestamp()),
(4, '0001', clock_timestamp()),
(5, '000A', clock_timestamp()),
(5, '0002', clock_timestamp()),
(5, '000A', clock_timestamp()),
(5, '0006', clock_timestamp());

I have a query

SELECT 
 DISTINCT ON (group_id) group_id, 
 last_value(code) over w_last AS last_code, 
 last_value(created_date) over w_last AS last_created_date
FROM events
WINDOW
 w_last AS (PARTITION BY group_id ORDER BY created_date ROWS BETWEEN unbounded preceding AND unbounded following);

Query Result

| group_id | last_code | last_created_date        |
| -------- | --------- | ------------------------ |
| 1        | 0002      | 2020-11-18T09:25:53.443Z |
| 2        | 0004      | 2020-11-18T09:25:53.443Z |
| 3        | 000A      | 2020-11-18T09:25:53.443Z |
| 4        | 0001      | 2020-11-18T09:25:53.443Z |
| 5        | 0006      | 2020-11-18T09:25:53.443Z |

I'm trying to get one additional field to show event after last '000A' event

 Eg. for group_id 1 event code after 000A is 0002
     for group_id 2 event code after 000A is 0003
     for group_id 3 event code after 000A is NULL
     for group_id 4 event code after 000A is NULL
     for group_id 5 event code after 000A is 0006 (from last)

Expected Output

| group_id | last_code | after_last_000A | last_created_date        |
| -------- | --------- | --------------- | ------------------------ |
| 1        | 0002      |     0002        | 2020-11-18T09:25:53.443Z |
| 2        | 0004      |     0003        | 2020-11-18T09:25:53.443Z |
| 3        | 000A      |     NULL        | 2020-11-18T09:25:53.443Z |
| 4        | 0001      |     NULL        | 2020-11-18T09:25:53.443Z |
| 5        | 0006      |     0006        | 2020-11-18T09:25:53.443Z |

Same in DB Fiddle


Solution

  • step-by-step demo: db<>fiddle

    SELECT DISTINCT ON (group_id)                                              -- 4
        group_id,
        code as last_code,
        first_value(after_000A)                                                -- 3
            OVER (PARTITION BY group_id ORDER BY after_000A DESC NULLS LAST) as after_last,
        created_date as last_created_date
    FROM (
        SELECT 
            *,
            CASE WHEN code = '000A' THEN                                       -- 2
                lead(code) OVER (PARTITION BY group_id ORDER BY created_date)  -- 1
            END as after_000A
        FROM 
            events
    ) s
    ORDER BY group_id, created_date DESC                                       -- 4
    
    1. lead() return the next row's value if available, NULL else
    2. CASE clause eleminates all unrelated values. Therefore, only the lead values for codes '000A' are displayed. This is the new after_000A column
    3. first_value() gives the first value of an ordered partition. Here the group_id partitions are ordered by the values of the previously created after_000A column (descending to fetch the last one).
    4. DISTINCT ON (group_id) returns only one record per group_id group, in that case the last one (ordered by created_date desc).