Search code examples
oracleindexinginsertdatabase-performanceoracle11gr2

Oracle 11G - Performance effect of indexing at insert


Objective

Verify if it is true that insert records without PK/index plus create thme later is faster than insert with PK/Index.

Note
The point here is not about indexing takes more time (it is obvious), but the total cost (Insert without index + create index) is higher than (Insert with index). Because I was taught to insert without index and create index later as it should be faster.

Environment

Windows 7 64 bit on DELL Latitude core i7 2.8GHz 8G memory & SSD HDD
Oracle 11G R2 64 bit

Background

I was taught that insert records without PK/Index and create them after insert would be faster than insert with PK/Index.

However 1 million record inserts with PK/Index was actually faster than creating PK/Index later, approx 4.5 seconds vs 6 seconds, with the experiments below. By increasing the records to 3 million (999000 -> 2999000), the result was the same.

Conditions

  • The table DDL is below. One bigfile table space for both data and index.
    (Tested a separate index tablespace with the same result & inferior overall perforemace)
  • Flush the buffer/spool before each run.
  • Run the experiment 3 times each and made sure the results were similar.

SQL to flush:

ALTER SYSTEM CHECKPOINT;
ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

Question

Would it be actually true that "insert witout PK/Index + PK/Index creation later" is faster than "insert with PK/Index"?

Did I make mistakes or missed some conditions in the experiment?

Insert records with PK/Index

TRUNCATE TABLE TBL2;
ALTER TABLE TBL2 DROP CONSTRAINT PK_TBL2_COL1 CASCADE;
ALTER TABLE TBL2 ADD  CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;

SET timing ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
  WITH DATA2(j) AS (
      SELECT 0 j FROM DUAL
      UNION ALL
      SELECT j+1000 FROM DATA2 WHERE j < 999000
  )
  SELECT j FROM DATA2
),
(
  WITH DATA1(i) AS (
      SELECT 1 i FROM DUAL
      UNION ALL
      SELECT i+1 FROM DATA1 WHERE i < 1000
  )
  SELECT i FROM DATA1
);
commit;

1,000,000 rows inserted.
Elapsed: 00:00:04.328 <----- Insert records with PK/Index

Insert records without PK/Index and create them after

TRUNCATE TABLE TBL2;
ALTER TABLE &TBL_NAME DROP CONSTRAINT PK_TBL2_COL1 CASCADE;

SET TIMING ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
  WITH DATA2(j) AS (
      SELECT 0 j FROM DUAL
      UNION ALL
      SELECT j+1000 FROM DATA2 WHERE j < 999000
  )
  SELECT j FROM DATA2
),
(
  WITH DATA1(i) AS (
      SELECT 1 i FROM DUAL
      UNION ALL
      SELECT i+1 FROM DATA1 WHERE i < 1000
  )
  SELECT i FROM DATA1
);
commit;
ALTER TABLE TBL2 ADD CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;

1,000,000 rows inserted.
Elapsed: 00:00:03.454 <---- Insert without PK/Index

table TBL2 altered.
Elapsed: 00:00:02.544 <---- Create PK/Index

Table DDL

CREATE TABLE TBL2 (
    "COL1" NUMBER,
    "COL2" VARCHAR2(100 BYTE),
    CONSTRAINT "PK_TBL2_COL1" PRIMARY KEY ("COL1")
) TABLESPACE "TBS_BIG" ;

Solution

  • It's true that it is faster to modify a table if you do not also have to modify one or more indexes and possibly perform constraint checking as well, but it is also largely irrelevant if you then have to add those indexes. You have to consider the complete change to the system that you wish to effect, not just a single part of it.

    Obviously if you are adding a single row into a table that already contains millions of rows then it would be foolish to drop and rebuild indexes.

    However, even if you have a completely empty table into which you are going to add several million rows it can still be slower to defer the indexing until afterwards.

    The reason for this is that such an insert is best performed with the direct path mechanism, and when you use direct path inserts into a table with indexes on it, temporary segments are built that contain the data required to build the indexes (data plus rowids). If those temporary segments are much smaller than the table you have just loaded then they will also be faster to scan and to build the indexes from.

    the alternative, if you have five index on the table, is to incur five full table scans after you have loaded it in order to build the indexes.

    Obviously there are huge grey areas involved here, but well done for:

    1. Questioning authority and general rules of thumb, and
    2. Running actual tests to determine the facts in your own case.

    Edit:

    Further considerations -- you run a backup while the indexes are dropped. Now, following an emergency restore, you have to have a script that verifies that all indexes are in place, when you have the business breathing down your neck to get the system back up.

    Also, if you absolutely were determined to not maintain indexes during a bulk load, do not drop the indexes -- disable them instead. This preserves the metadata for the indexes existence and definition, and allows a more simple rebuild process. Just be careful that you do not accidentally re-enable indexes by truncating the table, as this will render disabled indexes enabled again.