Search code examples
oraclesql-updatequery-optimizationoracle12csql-merge

Massive UPDATE vs. MERGE performance on Oracle


I am trying to update a destination table from a source table using a single, massive UPDATE statement, but the execution time is way longer than it should.

Query

UPDATE MY_DEST
SET (DEST_B, DEST_C) = (
  SELECT SRC_A + SRC_B, SRC_B
  FROM MY_SRC
  WHERE SRC_KEY = DEST_KEY AND SRC_DATE = DEST_DATE
);

Both tables contain roughly 10 to 13 million rows, they have matching primary keys and we can safely assume that each row in the destination table has a corresponding row in the source table.

Table definitions

CREATE TABLE MY_SRC (
  SRC_KEY VARCHAR2(50),
  SRC_DATE DATE,
  SRC_A NUMBER(15,2),
  SRC_B NUMBER(15,2),
  CONSTRAINT MY_SRC_PK PRIMARY KEY (SRC_KEY, SRC_DATE)
);

CREATE TABLE MY_DEST (
  DEST_KEY VARCHAR2(50),
  DEST_DATE DATE,
  DEST_B NUMBER(15,2),
  DEST_C NUMBER(15,2),
  CONSTRAINT MY_DEST_PK PRIMARY KEY (DEST_KEY, DEST_DATE)
);

Execution plan

Plan hash value: 3904754293

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |           |    12M|   675M|   128M (20)| 01:23:55 |
|   1 |  UPDATE                      | MY_DEST   |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | MY_DEST   |    12M|   675M| 69756   (1)| 00:00:03 |
|   3 |   TABLE ACCESS BY INDEX ROWID| MY_SRC    |     1 |    46 |     4   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | MY_SRC_PK |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("SRC_KEY"=:B1 AND "SRC_DATE"=:B2)

Questions

  1. Is the above UPDATE query hopelessly forced to use a slow, row-by-row execution plan?

  2. Can I optimize the above query only by rewriting it as a MERGE statement, such as the following one?

    • Alternate query, rewritten as a MERGE

      MERGE INTO MY_DEST
      USING (SELECT SRC_KEY, SRC_DATE, SRC_B, SRC_A + SRC_B AS SRC_C FROM MY_SRC)
      ON (DEST_KEY = SRC_KEY AND DEST_DATE = SRC_DATE)
      WHEN MATCHED THEN UPDATE SET DEST_B = SRC_B, DEST_C = SRC_C;
      
    • Alternate execution plan

      Plan hash value: 2444580570
      
      ----------------------------------------------------------------------------------------
      | Id  | Operation            | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
      ----------------------------------------------------------------------------------------
      |   0 | MERGE STATEMENT      |         |    12M|   638M|       |   359K  (1)| 00:00:15 |
      |   1 |  MERGE               | MY_DEST |       |       |       |            |          |
      |   2 |   VIEW               |         |       |       |       |            |          |
      |*  3 |    HASH JOIN         |         |    12M|  2260M|   716M|   359K  (1)| 00:00:15 |
      |   4 |     TABLE ACCESS FULL| MY_SRC  |    12M|   568M|       |   162K  (1)| 00:00:07 |
      |   5 |     TABLE ACCESS FULL| MY_DEST |    12M|  1695M|       | 69756   (1)| 00:00:03 |
      ----------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         3 - access("DEST_KEY"="SRC_KEY" AND "DEST_DATE"="SRC_DATE")
      
  3. Can I get the same, good performance of the alternate MERGE statement when using an UPDATE?

I am using Oracle 12c.


Solution

  • 1) Yes, you never want to make an (estimated) 12M times an index access on a large table.

    The index blocks will be largely on the disc, so you may expect about access to 100 rows per seconds.

    Even if you get 1000 index aceess per second due to caching, you still have to wait hours for processing 12M rows.

    So the execution plan 1 is NO GO for a large table.

    2) The execution plan of the MERGE (hash join with two full table scans) looks fine and should work.

    3) Yes you can use UPDATE statement without problems in your setup. You will use the feature of the updatable join view

    Query

    update (
    select SRC_A, SRC_B,DEST_B, DEST_C 
    from MY_DEST d
    left outer join MY_SRC s
    on d.DEST_KEY = s.SRC_KEY and d.DEST_DATE = s.SRC_DATE)
    set DEST_B = SRC_A + SRC_B,
        DEST_C = SRC_B
    ; 
    

    Execution plan

    ---------------------------------------------------------------------------------------
    | Id  | Operation           | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------------------
    |   0 | UPDATE STATEMENT    |         |  1000K|    57M|       |  6682   (1)| 00:00:01 |
    |   1 |  UPDATE             | MY_DEST |       |       |       |            |          |
    |*  2 |   HASH JOIN OUTER   |         |  1000K|    57M|    40M|  6682   (1)| 00:00:01 |
    |   3 |    TABLE ACCESS FULL| MY_DEST |  1000K|    28M|       |  1341   (2)| 00:00:01 |
    |   4 |    TABLE ACCESS FULL| MY_SRC  |  1000K|    28M|       |  1341   (2)| 00:00:01 |
    ---------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       2 - access("D"."DEST_DATE"="S"."SRC_DATE"(+) AND 
                  "D"."DEST_KEY"="S"."SRC_KEY"(+))
              
    

    You see the similar execution plan is created as for the MERGE, so you will excpect also similar performance.

    Final Notes

    You may additionally use parallel hint to speed up.

    Don't forget you must enable parallel dml in your session.

    ALTER SESSION ENABLE PARALLEL DML;
    

    Your MERGE statement is not equal to your first UPDATE statement. The difference appears when a primary key exists in the destination table but not in the source table.

    The UPDATE resets the destination columns to NULL while the MERGE let them unchanged.

    My UPDATEstatement use an outer join so it behaves as your UPDATE - switch to inner join to get the MERGE behaviour.