I am trying to calculate a column that performs a running count of consecutive null values, but the running count will reset upon non-null values.
I am currently trying to achieve this on this version of redshift:
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.8187
I tried using this window function but this is just continuously incrementing the number for each null.
ROW_NUMBER() OVER (PARTITION BY ID, VAL ORDER BY VAL ROWS UNBOUNDED PRECEDING)
For example if I had a dataset like this:
id | date | val
----+-------+-------
1 | 1/1 | NULL
1 | 1/2 | NULL
1 | 1/3 | NULL
1 | 1/4 | 1
1 | 1/5 | NULL
1 | 1/6 | NULL
1 | 1/7 | 1
2 | 1/8 | 2
2 | 1/9 | NULL
2 | 1/1 | NULL
2 | 1/2 | 1
2 | 1/3 | NULL
2 | 1/4 | 0
2 | 1/5 | NULL
2 | 1/6 | NULL
I would like the output to look like this:
id | date | val | foo
----+-------+-------+-------
1 | 1/1 | NULL | 1
1 | 1/2 | NULL | 2
1 | 1/3 | NULL | 3
1 | 1/4 | 1 |
1 | 1/5 | NULL | 1
1 | 1/6 | NULL | 2
1 | 1/7 | 1 |
2 | 1/8 | 2 |
2 | 1/9 | NULL | 1
2 | 1/1 | NULL | 2
2 | 1/2 | 1 |
2 | 1/3 | NULL | 1
2 | 1/4 | 0 |
2 | 1/5 | NULL | 1
2 | 1/6 | NULL | 2
Starter
I think that there is a glitch in your sample data, in the below highlighted records:
id | date | val | foo
----+-------+-------+-------
1 | 1/1 | NULL | 1
1 | 1/2 | NULL | 2
1 | 1/3 | NULL | 3
1 | 1/4 | 1 |
1 | 1/5 | NULL | 1
1 | 1/6 | NULL | 2
1 | 1/7 | 1 |
2 | 1/8 | 2 | --> this record is not in sequence
2 | 1/9 | NULL | 1 --> neither this one
2 | 1/1 | NULL | 2 --> so this record should have foo = 1, not 2
2 | 1/2 | 1 |
2 | 1/3 | NULL | 1
2 | 1/4 | 0 |
2 | 1/5 | NULL | 1
2 | 1/6 | NULL | 2
I simply expurged these three records from the dataset. If you are not ok with that, do not read further...
Answer
This is a variation of the gaps-and-island problem. To solve it, the idea is to build groups made of consecutive null records. For this, we compute row_number()
s over two different partitions (by id
vs by id
and null/not null val
). The difference between the row numbers defines the group.
Then, all that is left to do is assign new row numbers to each record having a null val
within the group it belongs to.
Query:
select
id,
date,
val,
case when val is null
then row_number() over(partition by id, rn1 - rn2 order by date)
else null
end foo
from (
select
t.*,
row_number()
over(order by id, date) rn1,
row_number()
over(partition by id, case when val is null then 1 else 0 end order by date ) rn2
from mytable t
) t
order by id, date
| id | date | val | foo |
| --- | ------------------------ | --- | --- |
| 1 | 2019-01-01T00:00:00.000Z | | 1 |
| 1 | 2019-01-02T00:00:00.000Z | | 2 |
| 1 | 2019-01-03T00:00:00.000Z | | 3 |
| 1 | 2019-01-04T00:00:00.000Z | 1 | |
| 1 | 2019-01-05T00:00:00.000Z | | 1 |
| 1 | 2019-01-06T00:00:00.000Z | | 2 |
| 1 | 2019-01-07T00:00:00.000Z | 1 | |
| 2 | 2019-01-02T00:00:00.000Z | 1 | |
| 2 | 2019-01-03T00:00:00.000Z | | 1 |
| 2 | 2019-01-04T00:00:00.000Z | 0 | |
| 2 | 2019-01-05T00:00:00.000Z | | 1 |
| 2 | 2019-01-06T00:00:00.000Z | | 2 |