Search code examples
sqloracle

Oracle Merge statement 'ON' vs 'UPDATE' conditions performance on target table


I have MERGE query in Oracle DB.

MERGE INTO TABLE_1 T1
USING 
(
SELECT DISTINCT SMS_CENTER_ID FROM TABLE_2 T2 WHERE T2.DELIVERY_TIME >= ? AND T2.DELIVERY_TIME < ?) VIEW_1 
ON
(T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID AND T1.CREATE_TIME >= ? AND T1.CREATE_TIME < ?)

WHEN MATCHED THEN UPDATE
SET T1.DELETED = 1

Questions:

  1. Can the following change improve performance on large amounts of data? (such that the duration of the query is up to 5 hours)?

    MERGE INTO TABLE_1 T1
    USING 
    (
    SELECT DISTINCT SMS_CENTER_ID FROM TABLE_2 T2 WHERE T2.DELIVERY_TIME >= ? AND T2.DELIVERY_TIME < ?) VIEW_1 
    ON
    (T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID)
    
    WHEN MATCHED THEN UPDATE
    SET T1.DELETED = 1
    WHERE T1.CREATE_TIME >= ? AND T1.CREATE_TIME < ?
    
  2. Or does query performance stay the same?


Solution

    1. Can the following change improve performance on large amounts of data? (such that the duration of the query is up to 5 hours)?

      [...]

    2. Or does query performance stay the same?

    Neither option, moving the filter from the ON clause to a WHERE clause in the UPDATE part of the MERGE will probably make the performance worse (rather than better or the same).


    Given the sample data:

    CREATE TABLE table_1 (sms_center_id, create_time, deleted) AS
    SELECT 1, DATE '2024-01-01' + LEVEL - 1, 0
    FROM   DUAL
    CONNECT BY DATE '2024-01-01' + LEVEL - 1 < DATE '2025-01-01';
    
    CREATE TABLE table_2 (sms_center_id, delivery_time) AS
    SELECT 1, DATE '2024-01-01' FROM   DUAL;
    

    (Note: table_1 has 366 rows and table_2 has 1 row.)

    Then you can ensure the statistics are up-to-date on both tables:

    BEGIN
      DBMS_STATS.GATHER_TABLE_STATS( USER, 'TABLE_1');
      DBMS_STATS.GATHER_TABLE_STATS( USER, 'TABLE_2');
    END;
    /
    

    Then trying to update 200 out of the 366 rows:

    EXPLAIN PLAN FOR
    MERGE INTO TABLE_1 T1
    USING (
      SELECT DISTINCT SMS_CENTER_ID
      FROM   TABLE_2 T2
      WHERE  T2.DELIVERY_TIME >= DATE '2024-01-01'
      AND    T2.DELIVERY_TIME <  DATE '2025-01-01'
    ) VIEW_1 
    ON (
          T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID
      AND T1.CREATE_TIME >= DATE '2024-01-01'
      AND T1.CREATE_TIME <  DATE '2024-01-01' + 200
    )
    WHEN MATCHED THEN UPDATE
      SET T1.DELETED = 1;
    
    SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
    

    Generates the plan:

    ----------------------------------------------------------------------------------
    | Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------
    |   0 | MERGE STATEMENT        |         |   201 |   402 |     7  (15)| 00:00:01 |
    |   1 |  MERGE                 | TABLE_1 |       |       |            |          |
    |   2 |   VIEW                 |         |       |       |            |          |
    |*  3 |    HASH JOIN           |         |   201 |  5226 |     7  (15)| 00:00:01 |
    |   4 |     VIEW               |         |     1 |    13 |     4  (25)| 00:00:01 |
    |   5 |      SORT UNIQUE       |         |     1 |    11 |     4  (25)| 00:00:01 |
    |*  6 |       TABLE ACCESS FULL| TABLE_2 |     1 |    11 |     3   (0)| 00:00:01 |
    |*  7 |     TABLE ACCESS FULL  | TABLE_1 |   201 |  2613 |     3   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       3 - access("T1"."SMS_CENTER_ID"="VIEW_1"."SMS_CENTER_ID")
       6 - filter("T2"."DELIVERY_TIME">=TO_DATE(' 2024-01-01 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss') AND "T2"."DELIVERY_TIME"<TO_DATE(' 2025-01-01
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       7 - filter("T1"."CREATE_TIME"<TO_DATE(' 2024-07-19 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss') AND "T1"."CREATE_TIME">=TO_DATE(' 2024-01-01
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
    

    and:

    EXPLAIN PLAN FOR
    MERGE INTO TABLE_1 T1
    USING (
      SELECT DISTINCT SMS_CENTER_ID
      FROM   TABLE_2 T2
      WHERE  T2.DELIVERY_TIME >= DATE '2024-01-01'
      AND    T2.DELIVERY_TIME <  DATE '2025-01-01'
    ) VIEW_1 
    ON (
          T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID
    )
    WHEN MATCHED THEN UPDATE
      SET   T1.DELETED = 1
      WHERE T1.CREATE_TIME >= DATE '2024-01-01'
      AND   T1.CREATE_TIME <  DATE '2024-01-01' + 200;
    
    SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
    

    Generates the plan:

    ----------------------------------------------------------------------------------
    | Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------
    |   0 | MERGE STATEMENT        |         |   366 |  8052 |     7  (15)| 00:00:01 |
    |   1 |  MERGE                 | TABLE_1 |       |       |            |          |
    |   2 |   VIEW                 |         |       |       |            |          |
    |*  3 |    HASH JOIN           |         |   366 |  9516 |     7  (15)| 00:00:01 |
    |   4 |     VIEW               |         |     1 |    13 |     4  (25)| 00:00:01 |
    |   5 |      SORT UNIQUE       |         |     1 |    11 |     4  (25)| 00:00:01 |
    |*  6 |       TABLE ACCESS FULL| TABLE_2 |     1 |    11 |     3   (0)| 00:00:01 |
    |   7 |     TABLE ACCESS FULL  | TABLE_1 |   366 |  4758 |     3   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       3 - access("T1"."SMS_CENTER_ID"="VIEW_1"."SMS_CENTER_ID")
       6 - filter("T2"."DELIVERY_TIME">=TO_DATE(' 2024-01-01 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss') AND "T2"."DELIVERY_TIME"<TO_DATE(' 2025-01-01
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
    

    The first plan, where the filter is in the ON clause, has an additional filter and operates on (an estimate of) 201 rows. The second plan, has no additional filter and operates on all 366 rows so it is going to generate a greater volume of undo/redo logs as it works on all possible matches and not just the required subset of those rows.

    fiddle