Search code examples
postgresqlrow-number

PostgreSQL how to generate a partition row_number() with certain numbers overridden


I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).

I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.

I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:

http://sqlfiddle.com/#!17/236b5/5

The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.

Any pointers around the best way to achieve the desired_dynamic_number values dynamically?

Update:

I'm almost there using lag:

http://sqlfiddle.com/#!17/236b5/24


Solution

  • step-by-step demo:db<>fiddle

    SELECT 
        *,
        COALESCE(                                               -- 3
            first_value(override_as_number) OVER w              -- 2
            , 1
        )
        + row_number() OVER w - 1                               -- 4, 5
    FROM (
        SELECT
            *,
            SUM(                                                -- 1
                CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
            ) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
                as grouped
        FROM sample
    ) s
    WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
    
    1. Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
    2. first_value() gives the first value of such subpartition.
    3. The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
    4. row_number() - 1 creates a row count within a subpartition, starting with 0.
    5. Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.