Search code examples
sqlpostgresqlduplicatesinner-joinsql-delete

How to keep only one entry among several in PostgreSQL database


I have a database that monitors a network (snapshots table, that contains a snapshot_date column). This production database was flooded by a faulty crontab, resulting in many snapshots for the same device every day.

enter image description here

I don't won't to remove everything, but i want to keep only one snapshot per snapshot_date and per device_id (column type is "timestamp without time zone") so to reduce the number of entries in this table.

I don't know any simple mechanism to do this in plain SQL. Can this be achieved ?


Solution

  • One option uses distinct on:

    select distinct on (snapshot_date, device_id) *
    from mytable 
    order by snapshot_date, device_id, snapshot_id
    

    This retains the one row per snapshot_date and device_id that has the smalles snapshot_id. Note that this assumes that snapshot_id is unique (or, at least, is unique for each (snapshot_date, device_id) tuple).

    If you wanted a delete statement, then:

    delete from mytable t
    using (
        select snapshot_date, device_id, min(snapshot_id) snapshot_id
        from mytable 
        group by snapshot_date, device_id
    ) t1
    where 
        t.snapshot_date = t1.snapshot_date
        and t.device_id = t1.device_id
        and t.snapshot_id < t1.id