Search code examples
postgresqlduplicatessql-deletetemp-tablesrow-number

Can I delete rows/replace table based on temp table


This code gives me a table of the unique values (without duplicates):

SELECT id, firstname, lastname, startdate, position
FROM  (
   SELECT id, firstname, lastname, startdate, position,
     ROW_NUMBER() OVER (PARTITION BY (firstname, lastname) ORDER BY startdate DESC) rn
   FROM people
   ) tmp
WHERE rn = 1;

What syntax would replace the current table with just the results of this one?

Alternatively, I could use WHERE rn <> 1 to get all the data I want to delete, but again, I am struggling to get the syntax of the DELETE right using this method.


Solution

  • Assuming values in firstname, lastname and startdate are never NULL, this simple query with a NOT EXISTS anti-semi-join does the job:

    DELETE FROM people AS p
    WHERE  EXISTS (
       SELECT FROM people AS p1
       WHERE  p1.firstname = p.firstname
       AND    p1.lastname  = p.lastname
       AND    p1.startdate > p.startdate
       );
    

    It deletes every row where a newer copy exists, effectively keeping the latest row per group of peers. (Of course, (firstname, lastname) is a poor way of establishing identity. There are many distinct people with identical names. The demo may be simplified ...)

    Can there be identical values in startdate? Then you need a tiebreaker ...

    Typically faster than using a subquery with row_number(). There are a hundred and one ways to make this faster, depending on your precise situation and requirements. See:

    If compared columns can be NULL, consider:

    There is a whole dedicated tag for . Combine it with to narrow down: