Search code examples
sqlsnowflake-cloud-data-platformwindow-functionsgaps-and-islands

SQL, rank for each instance of a partition


I am trying to to create a rank for each instance of a status occurring, for example

ID Status From_date To_date rank
1 Available 2022-01-01 2022-01-02 1
1 Available 2022-01-02 2022-01-03 1
1 Unavailable 2022-01-03 2022-01-10 2
1 Available 2022-01-10 2022-01-20 3

For each ID, for each instance of a status occurring, by from_date ascending.

I want to do this as i see this as the best way of getting to the final result i want which is

ID Status From_date To_date rank
1 Available 2022-01-01 2022-01-03 1
1 Unavailable 2022-01-03 2022-01-10 2
1 Available 2022-01-10 2022-01-20 3

I tried dense_rank(partition by id order by status, from_date but can see now why that wouldnt work. Not sure how to get to this result.


Solution

  • So with this CTE for the data:

    with data(ID, Status, From_date, To_date) as (
        select * from values
            (1, 'Available',    '2022-01-01', '2022-01-02'),
            (1, 'Available',    '2022-01-02', '2022-01-03'),
            (1, 'Unavailable',  '2022-01-03', '2022-01-10'),
            (1, 'Available',    '2022-01-10', '2022-01-20')
    )
    

    the first result, being rank can be done with CONDITIONAL_CHANGE_EVENT:

    select *
        ,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
    from data;
    
    ID STATUS FROM_DATE TO_DATE RANK
    1 Available 2022-01-01 2022-01-02 0
    1 Available 2022-01-02 2022-01-03 0
    1 Unavailable 2022-01-03 2022-01-10 1
    1 Available 2022-01-10 2022-01-20 2

    and thus the keeps the first of each rank can be achieved with a QUALIFY/ROW_NUMBER, because the CONDITIONAL_CHANGE is a complex operation, needs wrapping in a sub-select, so the answer is not as short as I would like:

    select * from (
        select *
            ,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
        from data
    )
    qualify row_number() over(partition by id, rank ORDER BY From_date ) = 1
    

    gives:

    ID STATUS FROM_DATE TO_DATE RANK
    1 Available 2022-01-01 2022-01-02 0
    1 Unavailable 2022-01-03 2022-01-10 1
    1 Available 2022-01-10 2022-01-20 2

    Also, the final result minus the ranking can be done with:

    select *
    from data
    qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
    
    ID STATUS FROM_DATE TO_DATE
    1 Available 2022-01-01 2022-01-02
    1 Unavailable 2022-01-03 2022-01-10
    1 Available 2022-01-10 2022-01-20

    and thus a rank can be added at the end

    select *
        ,rank() over ( PARTITION BY ID ORDER BY From_date ) as rank
    from (
        select *
        from data
        qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
    )
    
    ID STATUS FROM_DATE TO_DATE RANK
    1 Available 2022-01-01 2022-01-02 1
    1 Unavailable 2022-01-03 2022-01-10 2
    1 Available 2022-01-10 2022-01-20 3