Search code examples
postgresqlsql-deletepostgresql-9.5postgresql-performance

How to get Postgresql total cost time from explain


I have an sql query on postgresql 9.5, but it takes too long time. And I run the explain query:

DELETE FROM source v1
    WHERE id < (SELECT MAX(id)
            FROM source v2
            WHERE v2.ent_id = v1.ent_id
            AND v2.name = v1.name
        );      

And ex plain is

Delete on source v1  (cost=0.00..1764410287608.21 rows=2891175 width=6)');
  ->  Seq Scan on source v1  (cost=0.00..1764410287608.21 rows=2891175 width=6)');
        Filter: (id < (SubPlan 2))');
        SubPlan 2');
          ->  Result  (cost=203424.76..203424.77 rows=1 width=0)');
                InitPlan 1 (returns $2)');
                  ->  Limit  (cost=0.43..203424.76 rows=1 width=8)');
                        ->  Index Scan Backward using source_id_ix on source v2  (cost=0.43..813697.74 rows=4 width=8)');
                              Index Cond: (id IS NOT NULL)');
                              Filter: (((ent_id)::text = (v1.ent_id)::text) AND ((name)::text = (v1.name)::text))');

My table has about 8.000.000 records. And I could not get the result for days. And I could not calculate how many times will take? is there any way for a new solution?


Solution

  • There is no really good way to predict execution time.

    As a very rough rule of thumb, you can compare a cost of 1 to the time of reading one 8 KB page from disk during a sequential scan, but that will often be off by more than an order of magnitude.

    To solve the underlying problem, try

    DELETE FROM source AS v1
    WHERE EXISTS (SELECT 1
                  FROM source AS v2
                  WHERE (v1.ent_id, v1.name) = (v2.ent_id, v2.name)
                  AND v2.id > v1.id);
    

    The problem with your query is that it has to execute an expensive subselect for every row found, while mine can perform a semijoin. Look at my query's execution plan.