database postgresql select window-functions gaps-and-islands

conditionally remove duplicates from database in postgres

I'd like to remove duplicates from the column 'value' but only if there was no change from the previous update. I read tutorials about lag and lead but couldn't find an example with removing duplicates.

Original:

+----+-------+-------+------------------------+
| ID | subID | value |       updated_at       |
+----+-------+-------+------------------------+
|  1 |     2 | 2.20  | 2020-02-16 07:36:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:31:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:26:25+01 |
|  1 |     2 | 2.30  | 2020-02-16 07:21:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:16:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:11:25+01 |
+----+-------+-------+------------------------+

Desired output:

+----+-------+-------+------------------------+
| ID | subID | value |       updated_at       |
+----+-------+-------+------------------------+
|  1 |     2 | 2.20  | 2020-02-16 07:36:25+01 |
|  1 |     2 | 2.30  | 2020-02-16 07:21:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:16:25+01 | 
+----+-------+-------+------------------------+

Solution

I'd use lag or lead and remove by ctid:

DELETE FROM yourtable WHERE ctid IN
(
  SELECT
    ctid
  FROM 
  (
    SELECT 
      ctid,
      value,
      LAG(value) OVER(PARTITION BY id, subid ORDER BY updated_at) pre
    FROM 
      yourtable t
  ) t
  WHERE value = pre 
)

As with any delete query from the internet, run it against a copy of the table...