sql hive sequence gaps-and-islands row-number

ROW_NUMBER OR OTHER SEQUENCE DEPENDING ON DATE (SQL)

I have a problem creating sequence using row_number and still can't deal with it.

I have a table

bc	io	date
1a	11	2022-01-01
1a	11	2022-01-02
1a	12	2022-01-03
1a	11	2022-01-04

When I use simple row_number partitioned by bc and io and ordered by date I get this result

bc	io	date	rn
1a	11	2022-01-01	1
1a	11	2022-01-02	2
1a	12	2022-01-03	1
1a	11	2022-01-04	3

But I need this result, when io changes, next io, that has met before already, should start with 1

bc	io	date	rn
1a	11	2022-01-01	1
1a	11	2022-01-02	2
1a	12	2022-01-03	1
1a	11	2022-01-04	1

I tried to use this sql, but it is not correct

select tt.*,row_number() over(partition by tt.bc,tt.io order by tt.date ) as rn
from (
    select '1a' as bc, 11 as io, '2021-01-01' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-02' as date
    union all
    select '1a' as bc, 12 as io, '2021-01-03' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-04' as date
) as tt

Solution

This is a common gaps-and-islands problem: group consecutive attribute values per key (given some "time-like" dimension). The approach is this:

Calculate row_number per key ordered by time dimentsion.
Calculate row_number per key and attributes of interest ordered by time dimension.
Find their difference - this would group consecutive same values of attributes (the second row_number resets to 1 on a change of some attribute and the difference increases).

Below is a query:

with src as (
    select inline(array(
      struct('1a', 11, date '2022-01-01'),
      struct('1a', 11, date '2022-01-02'),
      struct('1a', 12, date '2022-01-03'),
      struct('1a', 11, date '2022-01-04')
    )) as (bc, io, dt)
)
, prepared as (
  select
    src.*
    /*Partition by keys*/
    , row_number() over(partition by bc order by dt asc)
        /*Partition by keys AND attributes to track changes and create groups*/
      - row_number() over(partition by bc, io order by dt asc) as rn_diff
  from src
)
select
  bc, io, dt
  /*Partition by keys AND attributes to track changes AND group number*/
  , row_number() over(partition by bc, io, rn_diff order by dt asc) as rn
from prepared
order by dt asc

bc	io	dt	rn
1a	11	2022-01-01	1
1a	11	2022-01-02	2
1a	12	2022-01-03	1
1a	11	2022-01-04	1

dbfiddle based on Postgres (with more attributes added).