Search code examples
sqlpostgresqlwindow-functionspostgresql-9.6

Set all values in a column equal to the first non-null value over a window in Postgres


Take this table as an example:

+----+------------+------+
| id |    date    | flag |
+----+------------+------+
| A  | 01/01/2020 |    0 |
| A  | 01/02/2020 |    0 |
| A  | 01/03/2020 |    0 |
| A  | 01/04/2020 |    1 |
| A  | 01/05/2020 |    1 |
| B  | 01/01/2020 |    0 |
| B  | 01/02/2020 |    1 |
| B  | 01/03/2020 |    1 |
| B  | 01/04/2020 |    1 |
| B  | 01/05/2020 |    1 |
+----+------------+------+

There is some flag that is either set to 0 or 1. I want to create a new column called day_flagged that will contain the date that the flag first became a 1. For example, for id A, that would be 01/04/2020. For id B, that would be 01/02/2020.

This is what I currently have:

SELECT x.id, 
       x.date, 
       ( CASE 
           WHEN prev_flag = 0 
                AND next_flag = 1 
                AND x.flag = 1 THEN 1 
           ELSE NULL 
         END ) AS flagged 
FROM   (SELECT id, 
               date, 
               flag, 
               Lag(flag) 
                 OVER ( 
                   partition BY id 
                   ORDER BY date ASC) AS prev_flag, 
               Lead(flag) 
                 OVER ( 
                   partition BY id 
                   ORDER BY date ASC) AS next_flag 
        FROM   tableA) AS x;

The result of that is this:

+----+------------+---------+
| id |    date    | flagged |
+----+------------+---------+
| A  | 01/01/2020 | null    |
| A  | 01/02/2020 | null    |
| A  | 01/03/2020 | null    |
| A  | 01/04/2020 | 1       |
| A  | 01/05/2020 | null    |
| B  | 01/01/2020 | null    |
| B  | 01/02/2020 | 1       |
| B  | 01/03/2020 | null    |
| B  | 01/04/2020 | null    |
| B  | 01/05/2020 | null    |
+----+------------+---------+

I'm able to identify when the value of flag for each id first changed from 0 to 1 and store that in flagged. How can I take the date value corresponding to the rows where flagged is 1, and insert that date into every row of the partition as day_flagged?

Desired outcome:

+----+------------+------+-------------+
| id |    date    | flag | day_flagged |
+----+------------+------+-------------+
| A  | 01/01/2020 |    0 | 01/04/2020  |
| A  | 01/02/2020 |    0 | 01/04/2020  |
| A  | 01/03/2020 |    0 | 01/04/2020  |
| A  | 01/04/2020 |    1 | 01/04/2020  |
| A  | 01/05/2020 |    1 | 01/04/2020  |
| B  | 01/01/2020 |    0 | 01/02/2020  |
| B  | 01/02/2020 |    1 | 01/02/2020  |
| B  | 01/03/2020 |    1 | 01/02/2020  |
| B  | 01/04/2020 |    1 | 01/02/2020  |
| B  | 01/05/2020 |    1 | 01/02/2020  |
+----+------------+------+-------------+

DB Fiddle: https://www.db-fiddle.com/f/wJsTnvNkYELHqLjHRx1pie/4


Solution

  • I understand that you want the date of the first 1 for each id.

    If so, it seems like a conditional window min() would fit your need:

    select
        t.*,
        min(date) filter(where flag = 1) over(partition by id) day_flagged
    from tableA t
    

    Demo on DB Fiddle:

    | id  | date       | flag | day_flagged |
    | --- | ---------- | ---- | ----------- |
    | A   | 01/01/2020 | 0    | 01/04/2020  |
    | A   | 01/02/2020 | 0    | 01/04/2020  |
    | A   | 01/03/2020 | 0    | 01/04/2020  |
    | A   | 01/04/2020 | 1    | 01/04/2020  |
    | A   | 01/05/2020 | 1    | 01/04/2020  |
    | B   | 01/01/2020 | 0    | 01/02/2020  |
    | B   | 01/02/2020 | 1    | 01/02/2020  |
    | B   | 01/03/2020 | 1    | 01/02/2020  |
    | B   | 01/04/2020 | 1    | 01/02/2020  |
    | B   | 01/05/2020 | 1    | 01/02/2020  |