Search code examples
sqlpostgresqlgaps-and-islandsdense-rank

Numbering rows based on multiple fields changes (incluging an "invisible" one) in PostgreSQL


I had a look at the previous topics, but I cannot achieve what I want.

I have a table like this :

id   status     update_date
---  ---        ---
A    PENDING    2020-11-01
A    PENDING    2020-11-02
A    CONFIRMED  2020-11-03
A    CONFIRMED  2020-11-04
A    CONFIRMED  2020-11-05
A    PENDING    2020-11-06
A    PAID       2020-11-07
B    CONFIRMED  2020-11-02
etc.

and I want to have this :

id   status     rank
---  ---        ---
A    PENDING    1
A    CONFIRMED  2
A    PENDING    3
A    PAID       4
B    CONFIRMED  1
etc.

meaning taking into account the update_date (and of course the status change) to sort and number the rows, but NOT having the order date in the final result

PS: as you can see, I can go back and forth from one status to the other ( PENDING -> CONFIRMED -> PENDING -> etc.) multiple times

Thanks lot !


Solution

  • You can address this as a gaps-and-island problem. The difference between row numbers gives you the group each record belongs to, that you can then use to aggregate:

    select id, status, 
        row_number() over(partition by id order by min(update_date)) as rn
    from (
        select t.*,
            row_number() over(partition by id order by update_date) rn1,
            row_number() over(partition by id, status order by update_date) rn2
        from mytable t
    ) t
    group by id, status, rn1 - rn2
    order by id, min(update_date) 
    

    Demo on DB Fiddle:

    id | status    | rn
    :- | :-------- | -:
    A  | PENDING   |  1
    A  | CONFIRMED |  2
    A  | PENDING   |  3
    A  | PAID      |  4
    B  | CONFIRMED |  1