sql postgresql duplicates inner-join sql-delete

How to keep only one entry among several in PostgreSQL database

I have a database that monitors a network (snapshots table, that contains a snapshot_date column). This production database was flooded by a faulty crontab, resulting in many snapshots for the same device every day.

I don't won't to remove everything, but i want to keep only one snapshot per snapshot_date and per device_id (column type is "timestamp without time zone") so to reduce the number of entries in this table.

I don't know any simple mechanism to do this in plain SQL. Can this be achieved ?

Solution

One option uses distinct on:

select distinct on (snapshot_date, device_id) *
from mytable 
order by snapshot_date, device_id, snapshot_id

This retains the one row per snapshot_date and device_id that has the smalles snapshot_id. Note that this assumes that snapshot_id is unique (or, at least, is unique for each (snapshot_date, device_id) tuple).

If you wanted a delete statement, then:

delete from mytable t
using (
    select snapshot_date, device_id, min(snapshot_id) snapshot_id
    from mytable 
    group by snapshot_date, device_id
) t1
where 
    t.snapshot_date = t1.snapshot_date
    and t.device_id = t1.device_id
    and t.snapshot_id < t1.id