postgresql duplicates sql-delete temp-tables row-number

Can I delete rows/replace table based on temp table

This code gives me a table of the unique values (without duplicates):

SELECT id, firstname, lastname, startdate, position
FROM  (
   SELECT id, firstname, lastname, startdate, position,
     ROW_NUMBER() OVER (PARTITION BY (firstname, lastname) ORDER BY startdate DESC) rn
   FROM people
   ) tmp
WHERE rn = 1;

What syntax would replace the current table with just the results of this one?

Alternatively, I could use WHERE rn <> 1 to get all the data I want to delete, but again, I am struggling to get the syntax of the DELETE right using this method.

Solution

Assuming values in firstname, lastname and startdate are never NULL, this simple query with a NOT EXISTS anti-semi-join does the job:

DELETE FROM people AS p
WHERE  EXISTS (
   SELECT FROM people AS p1
   WHERE  p1.firstname = p.firstname
   AND    p1.lastname  = p.lastname
   AND    p1.startdate > p.startdate
   );

It deletes every row where a newer copy exists, effectively keeping the latest row per group of peers. (Of course, (firstname, lastname) is a poor way of establishing identity. There are many distinct people with identical names. The demo may be simplified ...)

Can there be identical values in startdate? Then you need a tiebreaker ...

Typically faster than using a subquery with row_number(). There are a hundred and one ways to make this faster, depending on your precise situation and requirements. See:

How do I (or can I) SELECT DISTINCT on multiple columns?

If compared columns can be NULL, consider:

How to delete duplicate rows without unique identifier

There is a whole dedicated tag for duplicate-removal. Combine it with postgres to narrow down:

https://stackoverflow.com/questions/tagged/duplicates+postgresql