I have a problem creating sequence using row_number
and still can't deal with it.
I have a table
bc | io | date |
---|---|---|
1a | 11 | 2022-01-01 |
1a | 11 | 2022-01-02 |
1a | 12 | 2022-01-03 |
1a | 11 | 2022-01-04 |
When I use simple row_number
partitioned by bc
and io
and ordered by date
I get this result
bc | io | date | rn |
---|---|---|---|
1a | 11 | 2022-01-01 | 1 |
1a | 11 | 2022-01-02 | 2 |
1a | 12 | 2022-01-03 | 1 |
1a | 11 | 2022-01-04 | 3 |
But I need this result, when io
changes, next io
, that has met before already, should start with 1
bc | io | date | rn |
---|---|---|---|
1a | 11 | 2022-01-01 | 1 |
1a | 11 | 2022-01-02 | 2 |
1a | 12 | 2022-01-03 | 1 |
1a | 11 | 2022-01-04 | 1 |
I tried to use this sql, but it is not correct
select tt.*,row_number() over(partition by tt.bc,tt.io order by tt.date ) as rn
from (
select '1a' as bc, 11 as io, '2021-01-01' as date
union all
select '1a' as bc, 11 as io, '2021-01-02' as date
union all
select '1a' as bc, 12 as io, '2021-01-03' as date
union all
select '1a' as bc, 11 as io, '2021-01-04' as date
) as tt
This is a common gaps-and-islands problem: group consecutive attribute values per key (given some "time-like" dimension). The approach is this:
row_number
per key ordered by time dimentsion.row_number
per key and attributes of interest ordered by time dimension.row_number
resets to 1 on a change of some attribute and the difference increases).Below is a query:
with src as (
select inline(array(
struct('1a', 11, date '2022-01-01'),
struct('1a', 11, date '2022-01-02'),
struct('1a', 12, date '2022-01-03'),
struct('1a', 11, date '2022-01-04')
)) as (bc, io, dt)
)
, prepared as (
select
src.*
/*Partition by keys*/
, row_number() over(partition by bc order by dt asc)
/*Partition by keys AND attributes to track changes and create groups*/
- row_number() over(partition by bc, io order by dt asc) as rn_diff
from src
)
select
bc, io, dt
/*Partition by keys AND attributes to track changes AND group number*/
, row_number() over(partition by bc, io, rn_diff order by dt asc) as rn
from prepared
order by dt asc
bc | io | dt | rn |
---|---|---|---|
1a | 11 | 2022-01-01 | 1 |
1a | 11 | 2022-01-02 | 2 |
1a | 12 | 2022-01-03 | 1 |
1a | 11 | 2022-01-04 | 1 |
dbfiddle based on Postgres (with more attributes added).