Search code examples
sqlpostgresqlwindow-functionsgaps-and-islands

SQL Gaps and Islands fails with 3 rows


Considering the following data

# SELECT * FROM foobar;
 id | quantity |    time
----+----------+------------
  1 |       50 | 2022-01-01
  2 |      100 | 2022-01-02
  3 |       50 | 2022-01-03
  4 |       50 | 2022-01-04

(note: the 4th row is simply here to illustrate the expected result, it's not necessary to reproduce the issue)

I want to extract three distinct groups based on two assumptions: 1) a new group should be created everytime quantity changes and 2) consecutive identical quantities should be merged in the same group. The end result should look something like this

 id | quantity |    time    |  group_id
----+----------+------------+-------------
  1 |       50 | 2022-01-01 |           1
  2 |      100 | 2022-01-02 |           2
  3 |       50 | 2022-01-03 |           3
  4 |       50 | 2022-01-04 |           3

I've tried implementing a gaps and island solution to this problem but it fails with this specific data, as the group end up merging rows with different quantity. Note that swapping row 2 and 3 fixes the issue. Using DENSE_RANK results in the same problem.

# SELECT
    *,
    ROW_NUMBER() OVER (ORDER BY time) as global_rank,
    ROW_NUMBER() OVER (PARTITION BY quantity ORDER BY time) as qty_counter
FROM foobar;
 id | quantity |    time    | global_rank | qty_counter
----+----------+------------+-------------+-------------
  1 |       50 | 2022-01-01 |           1 |           1    # global_rank - qty_counter = 0 
  2 |      100 | 2022-01-02 |           2 |           1    # global_rank - qty_counter = 1 
  3 |       50 | 2022-01-03 |           3 |           2    # global_rank - qty_counter = 1 
  4 |       50 | 2022-01-04 |           4 |           3    # global_rank - qty_counter = 1 
(4 rows)

How can I change this query to get the expected result ?


Solution

  • I find that this is simpler solved with lag(); the idea is to compare each quantity to the previous value, and increment a counter when they do not match.

    select id, quantity, time, 
        count(*) filter(where quantity is distinct from lag_quantity) over(order by time) as grp
    from (
        select f.*, lag(quantity) over(order by time) lag_quantity
        from foobar f
    ) f
    order by time
    
    id quantity time grp
    1 50 2022-01-01 1
    2 100 2022-01-02 2
    3 50 2022-01-03 3
    4 50 2022-01-04 3

    fiddle