oracle-database performance parallel-processing insert datapump

Table of datapump taking so much time while inserting

First of all, sorry about my english, i'm spanish and i'm not so good at it.

I've been having some troubles exporting and importing with datapump a few schemas between 2 cloned databases (to make a single update data).

First, I tried making an expdp with this parfile:

[oracle@ES-NAW-ORACLEVM-PRO backup]$ cat /u01/app/oracle/EXPORTS/FEDBPRE/EXP_FEDBPRE_para_CLON.par
directory=EXPORT_TEMP
dumpfile=EXP_FEDBPRE_%U.dmp
logfile=EXP_FEDBPRE.log 
schemas=AQADM,ASPNETOP,ASSISTANT,AUTOPUB,AUTOPUBOP,AVANTTIC,AVAN_SPA,DBAWKE,JAUSER,JURIMETRIA,JURIMETRIA_OLD,JURI_OPW,MONDB,NAGIOS,NASPOP,NTTAM,PREOP,PREOP_TEST,PRESENTATION,PRESENTATION_TEMP,PRESENT_ACT,PUB,PUBOP,SCOTT,TRACE,TRACEOP,WKE
FILESIZE=10g
parallel=4

And then:

expdp \'/ as sysdba\' PARFILE=/u01/app/oracle/EXPORTS/FEDBPRE/EXP_FEDBPRE_para_CLON.par

It took like 15 mins to export all schemas. I moved the DMP files to the cloned server, DROPed the USERs with CASCADE option on the database and i let the import run all night with this parfile:

[oracle@ES-NAW-ORACLEVM-PRO FEDBPRE_bkp]$ cat /backup/FEDBPRE_bkp/IMP_FEDBPRE_para_CLON.par
directory=EXPORT_TEMP
dumpfile=EXP_FEDBPRE_%U.dmp
logfile=IMP_FEDBPRE.log
ignore=yes
PARALLEL=8

impdp \'/ as sysdba\' PARFILE=/backup/FEDBPRE_bkp/IMP_FEDBPRE_para_CLON.par

The next day, i watched it and it took like 4h30min to finish the import. I thought it was too much time being the export 15min, so i re-run the import to see in real time what was happening.

While it was running, i was looking how it was going on the database searching for the sessions working on it with this query:

select s.sid, s.module, s.state, substr(s.event, 1, 21) as event, 
s.seconds_in_wait as secs, substr(sql.sql_text, 1, 30) as sql_text
from v$session s
join v$sql sql on sql.sql_id = s.sql_id
where s.module like 'Data Pump%'
order by s.module, s.sid;

At the beggining, it looked like everything was working well:

Import: Release 12.1.0.2.0 - Production on Mon Jan 16 13:44:55 2023

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Legacy Mode Active due to the following parameters:
Legacy Mode Parameter: "ignore=TRUE" Location: Parameter File, Replaced with: "table_exists_action=append"
Master table "SYS"."SYS_IMPORT_FULL_02" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_02":  SYS/******** PARFILE=/backup/FEDBPRE_bkp/IMP_FEDBPRE_para_CLON.par 
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/TABLESPACE_QUOTA
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/SYNONYM/SYNONYM
Processing object type SCHEMA_EXPORT/TYPE/TYPE_SPEC
Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE
Processing object type SCHEMA_EXPORT/SEQUENCE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
. . imported "PUB"."PUBLICATIONS"                        1.582 GB 23242881 rows
. . imported "ASSISTANT"."ASSIST_NODES_RESOURCES"        1.319 GB 74670288 rows

And using the query I was seeing everything normal:

  SID MODULE            STATE               EVENT                       SECS SQL_TEXT
----- ----------------- ------------------- --------------------- ---------- ------------------------------
  312 Data Pump Master  WAITING             wait for unread messa          1 BEGIN :1 := sys.kupc$que_int.r
   65 Data Pump Worker  WAITING             log file switch (chec         46  BEGIN    SYS.KUPW$WORKER.MAIN
   75 Data Pump Worker  WAITING             log file switch (chec         39  BEGIN    SYS.KUPW$WORKER.MAIN
  127 Data Pump Worker  WAITING             log file switch (chec         55  BEGIN    SYS.KUPW$WORKER.MAIN
  187 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  187 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  194 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  194 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  247 Data Pump Worker  WAITING             wait for unread messa          3 BEGIN :1 := sys.kupc$que_int.t
  247 Data Pump Worker  WAITING             wait for unread messa          3 BEGIN :1 := sys.kupc$que_int.t
  249 Data Pump Worker  WAITING             direct path sync               1 INSERT /*+ APPEND PARALLEL("TR
  301 Data Pump Worker  WAITING             log file switch (chec         55 INSERT /*+ APPEND PARALLEL("TR
  361 Data Pump Worker  WAITING             log file switch (chec         55 INSERT /*+ APPEND PARALLEL("AS
  371 Data Pump Worker  WAITING             direct path sync               2 INSERT /*+ APPEND PARALLEL("TR
  418 Data Pump Worker  WAITING             direct path sync               2 INSERT /*+ APPEND PARALLEL("TR
  428 Data Pump Worker  WAITING             PX Deq: Execute Reply          1 INSERT /*+ APPEND PARALLEL("TR

But suddenly, impdp looked like frozen after table ASSISTANT.ASSIST_NODES and i wanted to know what was going on:

[...]
. . imported "ASSISTANT"."ASSIST_NODES_DA"               307.6 MB 4322248 rows
. . imported "ASSISTANT"."ASSIST_TYPES_CHANGED"          21.15 MB 1249254 rows
. . imported "ASSISTANT"."STR_RESOURCES"                 845.4 MB 10994245 rows
. . imported "ASSISTANT"."ASSIST_NODES"                  6.526 GB 74638678 rows

  SID MODULE            STATE               EVENT                       SECS SQL_TEXT
----- ----------------- ------------------- --------------------- ---------- ------------------------------
  312 Data Pump Master  WAITING             wait for unread messa          1 BEGIN :1 := sys.kupc$que_int.r
   65 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
   65 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
   75 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
   75 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  127 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
  127 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
  187 Data Pump Worker  WAITING             wait for unread messa          3 BEGIN :1 := sys.kupc$que_int.t
  187 Data Pump Worker  WAITING             wait for unread messa          3 BEGIN :1 := sys.kupc$que_int.t
  194 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  194 Data Pump Worker  WAITING             wait for unread messa          4 BEGIN :1 := sys.kupc$que_int.t
  247 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
  247 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
  361 Data Pump Worker  WAITED KNOWN TIME   direct path sync               0 INSERT /*+ APPEND PARALLEL("AS
  428 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t
  428 Data Pump Worker  WAITING             wait for unread messa          2 BEGIN :1 := sys.kupc$que_int.t

I searched the session with SID=361 and was executing the following SQL_ID=bh6qct41h9bth and the text was:

INSERT /*+ APPEND PARALLEL("ASSIST_NODES_METADATA",1)+*/ 
INTO RELATIONAL("ASSISTANT"."ASSIST_NODES_METADATA" NOT XMLTYPE) 
("NODE_ID", "AST_NODES_MT_TYPE", "XML_DATA")     SELECT "NODE_ID", 
"AST_NODES_MT_TYPE", SYS.XMLTYPE.CREATEXML("XML_DATA")      FROM 
"SYS"."ET$0169B1810001" KU$

Appearenlty, the data inserts were doing one by one, even knowing that i set PARALLEL=8 on parfile. I don't know if the XML_DATA column of this table is what causes it, probably.

Searching for this slowness, i found this oracle documentation: Doc ID 2014960.1 Where i can see that Oracle Databases Enterprise Editions from Version 11.2.0.3 to 12.1.0.2 can be affected by Bug 19520061.

So... they propose 3 solutions:

1. Upgrade the database to 12.2, when available, where issue is fixed.
- OR -
2. For earlier database releases please check Patch 19520061, if available 
for your platform and RDBMS version.
- OR -
3. Run the DataPump import job with an user other than SYS.

Confirming this table is making the impdp take so long, i have to tell that i made another import excluding the table and it took like 20 mins.

I tried the 3rd one with an user granted with DBA role and nothing changed, so... solution number 3 is dismissed.

I've seen some articles talking about increasing the table DEGREE parallelism but it didn't work either.

I was thinking the way to "force" oracle to insert the rows with a specific parallel, but not setting it in the parfile. Like the way that oracle make the insert like this, with the specific parallel (8) behind the table_name:

INSERT /*+ APPEND PARALLEL("ASSIST_NODES_METADATA",8)+*/ INTO 
RELATIONAL("ASSISTANT"."ASSIST_NODES_METADATA" NOT XMLTYPE)...

Any solution to reduce this impdp time besides to apply patch or upgrade?

Solution

If you want to focus on optimizing slow import of XML column of table ASSISTANT.ASSIST_NODES_METADATA, I believe most of benefits you can get if column will be stored as SECUREFILE (if it's not already, but I expect currently it's BASICFILE, taking into account performance issues you encounter and ask to resolve).

There are mainly two ways - either you can convert XML column XML_DATA to SECUREFILE on the Source DB (before doing Export), or on Destination DB. Preferable would be to do it on Source DB, because you should do it once, and then each time when you have to perform Exp/Imp of your schemas, no any additional steps will be needed. But that depends on your application and other conditions - are you able/allowed to perform such change or no.

If convert to SECUREFILE should be done on Destination DB - following considerations:

At some version of impdp utility was introduced new Transform parameter, TRANSFORM =LOB_STORAGE:SECUREFILE - check, do you have it in your 12.1.0.2 version
Then later was added other similar option LOB_STORAGE=SECUREFILE - both these parameters allows you to perform automac convert of LOB column (your XML_DATA column) into SECUREFILE storage type during Import process (at the moment of table segment creation)
Another way would be to separately import all schemas and objects / their data, and separately import problematic table ASSISTANT.ASSIST_NODES_METADATA (like you already did). During import of table, you first import only table's definition (using METADATA_ONLY mode, or just manually copying DDL from Source DB), but modify STORAGE AS part of CREATE TABLE statement to SECUREFILE
And then import only that table's data (DATA_ONLY mode)
You can try to simulate parallel Import using QUERY parameter in your Parameters file and starting in parallel 8 (or any other preferred degree of parallelism) separate impdp processes, with appropriate value for filter in QUERY parameter. For QUERY value you should use filter on NODE_ID or AST_NODES_MT_TYPE column - analyze contents of these columns, is it possible to split table into more or less same chunks or pieces

And few comments, just about Import process:

Set in your Parameters file options METRICS=YES and LOGTIME=ALL - then there will be no need to sit and watch how Import process is running - you will have timestamp and duration for each step, i.e. for each processed table, how long it took to import it. Also you will see which "path" Data Pump selected for each table - either CONVENTIONAL or DIRECT_PATH insert path
Also matters to exclude Optimizer Statistics, for both tables and indexes, via EXLUDE=TABLE_STATISTICS,INDEX_STATISTICS parameters (applies both to Export and Import phases). Rationale for such exclusion is - anyway good practice is to re-collect Optimizer's Statistics for all objects after their import into new DB. So, Import of statistics is useless, if statistics will be anyway overwritten by new gathering process. But exclusion will safe mearasible amount of time during Import process.
In one of outputs you posted, there was showed long Wait Events for event "log file switch (checkpoint incomplete)" - seems, during Import your Redo Logs config was not able to effectively catch up such load. Matters to review Redo Logs config and maybe add more Redo Log groups and/or use larger Redo Log members. In the same time there is no need to try to achieve (sometimes) recommended Redo Logs switch frequency around "every 20 min / 3 times per hour" - as your Import process is rare event. So, some balance needed.
Also as a trick sometimes used approach - think about switching DB into NOARCHIVELOG mode for Import phase, and after Import completed - switch back to ARCHIVELOG mode and perform immediate DB backup
Alternatively, if you don't want to go with changing NOARCHIVELOG / ARCHIVELOG modes (as that will require 2 DB restarts), then almost same effect you can achieve with putting Tablespace(-es), where will be stored imported segments, into NOLOGGING mode, perform Import, switch Tablespaces back into LOGGING mode and then perform immediate DB backup

Few URLs about, to read and learn what smart guys recommend about Data Pump Import optimization:

P.S. I believe, you want to optimize Import phase because you plan to execute it periodically (for example to refresh Destination DB with recent data from Source DB). Means, Export/Import exercise is planned to be periodic task, it's not one-time task?