Search code examples
sqlpostgresqlsubqueryaggregate-functionssql-delete

DELETE from multiple tables based on the same SELECT query


My table structure is as described in this post:

 name | version | processed | processing | updated  | ref_time 
------+---------+-----------+------------+----------+----------
 abc  |       1 | t         | f          | 27794395 | 27794160
 def  |       1 | t         | f          | 27794395 | 27793440
 ghi  |       1 | t         | f          | 27794395 | 27793440
 jkl  |       1 | f         | f          | 27794395 | 27794160
 mno  |       1 | t         | f          | 27794395 | 27793440
 pqr  |       1 | f         | t          | 27794395 | 27794160

Based on this answer, I am deriving a list of ref_time values which I want to use as a basis for deleting 'old' entries from status_table.

This is the query to generate the list of relevant ref_time values:

WITH main AS
(
    SELECT ref_time,
        ROUND(AVG(processed::int) * 100, 1) percent
    FROM status_table
    GROUP BY ref_time ORDER BY ref_time DESC, percent DESC
)
SELECT ref_time FROM main WHERE percent=100 OFFSET 2;

For example this might return:

 ref_time 
----------
 27794880
 27794160

I can then use this to DELETE all relevant entries in the status_table:

DELETE FROM status_table
WHERE ref_time IN 
(
    WITH main AS
    (
        SELECT ref_time,
            ROUND(AVG(processed::int) * 100, 1) percent
        FROM status_table
        GROUP BY ref_time ORDER BY ref_time DESC, percent DESC
    )
    SELECT ref_time FROM main WHERE percent=100 OFFSET 2
);

But I have another table named data_table, which also has a ref_time column, and I want to DELETE entries from that table on the same basis, i.e. any rows having ref_time in the above list.

How do I achieve this without duplicating the query used to generate the ref_time list?


Solution

  • You can use common table expressions:

    with 
        ref as (
            select ref_time 
            from status_table 
            group by ref_time 
            having bool_and(processed)
            order by ref_time desc limit 2
        ),
        del_ref as (
            delete from status_table s
            using ref r
            where s.ref_time = r.ref_time
        )
    delete from data_table d
    using ref r
    where d.ref_time = r.ref_time
    

    The first CTE,ref, returns the list of timestamps that you want to delete from the two other tables. I attempted to simplify the logic: you seem to want the top 2 timestamps that are fully processed (note that offset skips that many rows from the resultset, which is different than limit).

    The second CTE deletes from status_table, and the last part of the query addresses data_table.