Search code examples
sqlhivesequencegaps-and-islandsrow-number

ROW_NUMBER OR OTHER SEQUENCE DEPENDING ON DATE (SQL)


I have a problem creating sequence using row_number and still can't deal with it.

I have a table

bc io date
1a 11 2022-01-01
1a 11 2022-01-02
1a 12 2022-01-03
1a 11 2022-01-04

When I use simple row_number partitioned by bc and io and ordered by date I get this result

bc io date rn
1a 11 2022-01-01 1
1a 11 2022-01-02 2
1a 12 2022-01-03 1
1a 11 2022-01-04 3

But I need this result, when io changes, next io, that has met before already, should start with 1

bc io date rn
1a 11 2022-01-01 1
1a 11 2022-01-02 2
1a 12 2022-01-03 1
1a 11 2022-01-04 1

I tried to use this sql, but it is not correct

select tt.*,row_number() over(partition by tt.bc,tt.io order by tt.date ) as rn
from (
    select '1a' as bc, 11 as io, '2021-01-01' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-02' as date
    union all
    select '1a' as bc, 12 as io, '2021-01-03' as date
    union all
    select '1a' as bc, 11 as io, '2021-01-04' as date
) as tt

Solution

  • This is a common gaps-and-islands problem: group consecutive attribute values per key (given some "time-like" dimension). The approach is this:

    • Calculate row_number per key ordered by time dimentsion.
    • Calculate row_number per key and attributes of interest ordered by time dimension.
    • Find their difference - this would group consecutive same values of attributes (the second row_number resets to 1 on a change of some attribute and the difference increases).

    Below is a query:

    with src as (
        select inline(array(
          struct('1a', 11, date '2022-01-01'),
          struct('1a', 11, date '2022-01-02'),
          struct('1a', 12, date '2022-01-03'),
          struct('1a', 11, date '2022-01-04')
        )) as (bc, io, dt)
    )
    , prepared as (
      select
        src.*
        /*Partition by keys*/
        , row_number() over(partition by bc order by dt asc)
            /*Partition by keys AND attributes to track changes and create groups*/
          - row_number() over(partition by bc, io order by dt asc) as rn_diff
      from src
    )
    select
      bc, io, dt
      /*Partition by keys AND attributes to track changes AND group number*/
      , row_number() over(partition by bc, io, rn_diff order by dt asc) as rn
    from prepared
    order by dt asc
    
    bc io dt rn
    1a 11 2022-01-01 1
    1a 11 2022-01-02 2
    1a 12 2022-01-03 1
    1a 11 2022-01-04 1

    dbfiddle based on Postgres (with more attributes added).