I have a database that monitors a network (snapshots table, that contains a snapshot_date column). This production database was flooded by a faulty crontab, resulting in many snapshots for the same device every day.
I don't won't to remove everything, but i want to keep only one snapshot per snapshot_date and per device_id (column type is "timestamp without time zone") so to reduce the number of entries in this table.
I don't know any simple mechanism to do this in plain SQL. Can this be achieved ?
One option uses distinct on
:
select distinct on (snapshot_date, device_id) *
from mytable
order by snapshot_date, device_id, snapshot_id
This retains the one row per snapshot_date
and device_id
that has the smalles snapshot_id
. Note that this assumes that snapshot_id
is unique (or, at least, is unique for each (snapshot_date, device_id)
tuple).
If you wanted a delete
statement, then:
delete from mytable t
using (
select snapshot_date, device_id, min(snapshot_id) snapshot_id
from mytable
group by snapshot_date, device_id
) t1
where
t.snapshot_date = t1.snapshot_date
and t.device_id = t1.device_id
and t.snapshot_id < t1.id