Search code examples
sqloracledatabase-administrationredo-logs

Would Insert statement perform better than a merge statement in Oracle


May I ask would Insert statement perform better than a merge statement in Oracle if we don't need to do any update?

I understand that a single merge statement will usually a preferred option if we need to run both insert and update statement at the same time for the same table. But what if I only have an insert statement?

I have 20 tables like this

Create Table Txn_History nologging (
ID number,
Comment varchar2(300),
... (Another 20 columns),
Std_hash raw(1000) 
);

Alter table Txn_History add constraint Order_line_PK Primary (Std_hash);

And 20 staging tables like this

Create Table Order_line_staging nologging (
ID number,
Comment varchar2(300),
... (Another 20 columns),
Std_hash raw(1000) 
);

Alter table Order_line_staging add constraint Order_line_PK Primary (Std_hash);

Each Txn_History table now have 40 million rows and I am going to insert another 50 million rows from each corresponding Txn_History_staging in multiple batches. Each batch have 1 million rows. At the end of each iteration, the staging table will be dropped with purge statement.

Following is my merge statement

Merge into Txn_History Target
using Txn_History_Staging source on
    (source.std_hash = Target.std_has)
when not matched then
   insert(
      ID,
      Comment,
      ... (Another 20 columns),
      std_hash
   ),
   values
   (
       Target.ID,
       Target.comment,
       ... (Another 20 columns),
       Target.std_hash
   );

My database is on archivelog mode (FORCE_LOGGING = 'NO') and I noticed each iteration takes 2 hours to run and still generated 25GB archive log event nologging is on.

So I am suspecting the merge statement have generated archivelog.

Would it be better, if I am using following insert statement instead:

insert /*+append */ into Txn_History Target
select
    *
from
   Txn_History_staging Source
where
    Source.std_hash not in (select distinct std_hash  from txn_history);

Would it have a better performance (i.e. run faster AND ALSO generate less logs)?

Thanks in advance!


Solution

  • Most likely is that your INSERT will perform better, but the real proof is in the execution. If you're seeing as HASH ANTI JOIN as the mechanism to derive the required rows, that is typically close to optimal.

    Archive generation is a consequence of changes recorded in the database, so a standard MERGE (that ends up doing inserts) or an INSERT will generate very similar volume of archive.

    The differentiating factor is more likely the APPEND hint. By itself (on a table with the default of LOGGING), you will probably see either

    • no/tiny reduction in archive log if the table is not indexed
    • some reduction in archive log if the table is indexed

    The real gains are from you make the table NOLOGGING before you do the load, and set indexes to UNUSABLE before load, which can shrink the archivelog generation to almost none, but of course you need to understand the consequences of this

    • you'll need to take a backup of the impacted datafiles
    • you might need to take some reparative action on a standby database
    • you need to rebuild those indexes

    Like most things, its basically a cost/benefit analysis