Search code examples
sqlpostgresqlgaps-and-islands

Update duplicate values to the next free number


I have a table like below:

CREATE TABLE "table" (
    "index" serial PRIMARY KEY,
    "number" integer
);

With these rows:

INSERT INTO "table" ("number") VALUES
(1), (2), (2), (3), (4), (4), (4), (5), (13), (13), (17);

I want to update duplicate values to the next free integer number (except for the 1st one).
For this case it should be like:

(1), (2), (3), (4), (5), (6), (7), (8), (13), (14), (17)

How could the UPDATE work?


Solution

  • Every next free number potentially depends on all previous updates in the process. So this needs, at its heart, a procedural solution.

    The best solution depends on cardinalities and the frequency of duplicates and gaps. According to your sample I assume:

    • few dupes
    • moderately more gaps

    The below code works in any case, but best for the stated assumptions.

    DO
    $do$
    DECLARE
       _id int;
       _number int;
    BEGIN
       CREATE TEMP TABLE free ON COMMIT DROP AS
       SELECT number
       FROM  (SELECT generate_series(min(number), max(number) + 10) FROM tbl) n(number)
       LEFT  JOIN tbl t USING (number)
       WHERE t.number IS NULL;
    
       -- (only) if table is big, add an index
       CREATE INDEX ON pg_temp.free (number);
       
       FOR _id, _number IN
          SELECT id, number
          FROM (
             SELECT *, lag(number) OVER (ORDER BY number) AS last_num
             FROM   tbl
             ) dup
          WHERE dup.last_num = dup.number
       LOOP
          WITH del AS (
             DELETE FROM pg_temp.free f
             USING (
                SELECT f1.number
                FROM   pg_temp.free f1
                WHERE  f1.number > _number
                ORDER  BY f1.number 
                LIMIT  1
                ) d
             WHERE f.number = d.number
             RETURNING f.number
             )
          UPDATE tbl t
          SET    number = d.number
          FROM   del d
          WHERE  t.id = _id;
       END LOOP;
    END
    $do$;
    

    fiddle

    This PL/pgSQL code block first creates a temporary table of free numbers (free) in the range of the given table tbl. I (arbitrarily) add 10 more numbers after the highest one. If you might need more than 10 additional numbers past the highest one, you need to do more.

    If that table comes out big, create an index.

    Then run through all duplicates, and assign the next free number, consuming it.

    Obviously, this algorithm assumes no concurrent writes.