Search code examples
oracle-databasedatabase-performanceoracle12csqlperformancecost-based-optimizer

Why optimizer plan doesn't correlate with experimental query runs?


Suppose we have the following problem:

  • Given a table with one column 'X', containing some rows with random integers from 1 to 100:

    CREATE TABLE xtable(x) AS 
       SELECT ceil(dbms_random.value * 100) 
         FROM dual
       CONNECT BY level <= 1000000;
    
  • We must delete duplicate rows so all the distinct integers remain in the table.


Let's consider the three solutions (with average execution times and optimizer plans) below.

I must add that experiments show:

  • Solutions 1 and 2 are scalable and have a linear time growth with each row amount step (tested with tables up to 10 million rows)
  • Solution 3 has exponential time growth approximately like 3 * exp(0.6 * N)

We see that for the solution 2 optimizer plan give expectations unrelated to experimental results, and even opposite to them:

  • cost and other values are almost the same in plans 2 and 3
  • execution times are practically the same for solutions 1 and 2

And in this experiments the presence or absence of gathered statistics for the table doesn't affect optimizer plans and execution times.


Please, explain why I can't trust the optimizer plan in case 2.

What causes the optimizer to ignore the obvious difference between linear and exponential complexity?


Solutions:
1.

DELETE xtable WHERE rowid IN (
      SELECT ri from (
         SELECT rowid                                             AS ri,
                row_number() OVER(PARTITION BY x ORDER BY null) AS rn
           FROM xtable
      )
      WHERE rn > 1
)


Exe time: 14 - 16 secs

Plan:
------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows    | Bytes    | Cost | Time     |
------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT         |          | 1000000 | 15000000 | 5119 | 00:00:01 |
|   1 |   DELETE                 | XTABLE   |         |          |      |          |
| * 2 |    HASH JOIN SEMI        |          | 1000000 | 15000000 | 5119 | 00:00:01 |
|   3 |     TABLE ACCESS FULL    | XTABLE   | 1000000 |  3000000 |  280 | 00:00:01 |
|   4 |     VIEW                 | VW_NSO_1 | 1000000 | 12000000 | 2976 | 00:00:01 |
| * 5 |      VIEW                |          | 1000000 | 25000000 | 2976 | 00:00:01 |
|   6 |       WINDOW SORT        |          | 1000000 |  3000000 | 2976 | 00:00:01 |
|   7 |        TABLE ACCESS FULL | XTABLE   | 1000000 |  3000000 |  280 | 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - access(ROWID="RI")
* 5 - filter("RN">1)

2.

DELETE xtable WHERE (x, rowid) NOT IN (SELECT x, min(rowid) FROM xtable GROUP BY x)

Exe time: 15 - 17 secs

Plan:
--------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows    | Bytes   | Cost      | Time     |
--------------------------------------------------------------------------------------
|  0 | DELETE STATEMENT          |        |   50000 |  150000 | 278162850 | 03:01:06 |
|  1 |   DELETE                  | XTABLE |         |         |           |          |
|  2 |    FILTER                 |        |         |         |           |          |
|  3 |     TABLE ACCESS FULL     | XTABLE | 1000000 | 3000000 |       281 | 00:00:01 |
|  4 |     FILTER                |        |         |         |           |          |
|  5 |      SORT GROUP BY NOSORT |        | 1000000 | 3000000 |       280 | 00:00:01 |
|  6 |       TABLE ACCESS FULL   | XTABLE | 1000000 | 3000000 |       280 | 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - access(INTERNAL_FUNCTION("X")=INTERNAL_FUNCTION("X") AND INTERNAL_FUNCTION(ROWID)=INTERNAL_FUNCTION("MIN(ROWID)"))
* 5 - filter(INTERNAL_FUNCTION(ROWID)=INTERNAL_FUNCTION("MIN(ROWID)") AND INTERNAL_FUNCTION("X")=INTERNAL_FUNCTION("X"))

3.

DELETE xtable a WHERE EXISTS(select 1 FROM xtable b WHERE a.x = b.x AND a.rowid < b.rowid)

Exe time: 970 - 990 sec

Plan:
----------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Rows    | Bytes   | Cost      | Time     |
----------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT                 |        |   50000 |  300000 | 278208956 | 03:01:08 |
|   1 |   DELETE                         | XTABLE |         |         |           |          |
| * 2 |    FILTER                        |        |         |         |           |          |
|   3 |     NESTED LOOPS SEMI            |        |   50000 |  300000 | 278208956 | 03:01:08 |
|   4 |      TABLE ACCESS FULL           | XTABLE | 1000000 | 3000000 |       280 | 00:00:01 |
| * 5 |      TABLE ACCESS BY ROWID RANGE | XTABLE |   50000 |  150000 |       278 | 00:00:01 |
----------------------------------------------------------------------------------------------    
Predicate Information (identified by operation id):
------------------------------------------
* 2 - filter(:VAR2=:VAR1)
* 5 - access("B".ROWID>"A".ROWID)

Plans were obtained on Oracle 12.1.0.2.0


Solution

  • Please, explain why I can't trust the optimizer plan in case 2.

    You should never trust the optimizer. CBO is 95% rigth, but you do not know which 5% are wrong.

    Typical problem is that the execution plan shown using EXPLAIN PLAN doesn't equal to the plan used by execution. (You do not say how you obtain the plan).

    In doubt use DBMS_SQLTUNE.REPORT_SQL_MONITOR for long running queires to see the actual plan and the problematic parts.

    What causes the optimizer to ignore the obvious difference between linear and exponential complexity?

    See above and forget the cost compare of the plans. What you want to avoid while processing a whole table is the NESTED LOOP processing. This is exactly what happens in case 3.

     |  3 |     NESTED LOOPS SEMI            |       |   50000|  300000 | 278208956 | 03:01:08|
     |  4 |      TABLE ACCESS FULL           |XTABLE | 1000000| 3000000 |       280 | 00:00:01|
     |  5 |      TABLE ACCESS BY ROWID RANGE |XTABLE |   50000|  150000 |       278 | 00:00:01|
    

    You want to see SORT and HASH JOIN this is waht plan 1 shows.

    In my opinion the plan 2 will not scale with the number of duplicated records (simple try it with a table having each row twice and see if you get the same elapsed time as in case 3). The optimizer can not estimate the number of duplicated records, so defensively estimates high number and therefore high cost.

    Last but one remark - the theory says you should not observe linear behaviour but at best O(n * log(n)).

    Last remark - your test data are not realistic for a dups removal. Typical you have a large table with a small number of of dups. In your setup all records except for 100 are dups.

    The cost of delete dominates the cost of finding dups so you observe linear behaviour.

    Try with

    CREATE TABLE xtable(x) AS 
       SELECT ceil(dbms_random.value * 100000000) 
         FROM dual
       CONNECT BY level <= 1000000;
    
    select count(*) total, count(*)- count(distinct x) to_be_deleted from xtable;
         TOTAL TO_BE_DELETED
    ---------- -------------
       1000000          5083  
    

    So you will remove 0.5% of the records. Now scale and you will observe completely other pattern.