Search code examples
informaticainformatica-powercenterinformatica-powerexchangeoracle-cdc

Informatica CDC Mapping: Group Source is fetching records Slowly


We have 37 Informatica Sessions in which most of the Sessions have around 25 tables on average. Few sessions have 1 table as source and target. Our Source is Oracle and target is Greenplum database. We are using Powerexchange 10.1 installed on Oracle to fetch our Changed records.

We have noticed that for the sessions having more tables it is taking more time to fetch the data and update in target. Does adding more tables make any delay in Processing? In that case How to tune to fetch the records as fast as possible?


Solution

  • We run 19 CDC mappings with between 17 and 90 tables in each, and have recently had a breakthrough in performance. The number of tables is not the most significant limiting factor for us, power center and power exchange is. Our source is DB2 on z/OS, but that is probably not important ... This is what we did:

    1) we increased the DTM buffer block-size to 256KB, and DTM buffer size to 1GB or more, a 'complex' mapping needs many buffer blocks.

    2) we change the connection attributes to: - Realtime flush latency=86000 (max setting) - Commit-size in session were set extremely high (to allow the above setting to be the deciding factor) - OUW count=-1 (Same reason as above) - maximum rows per commit=0 - minimum rows per commit=0

    3) we set the session property 'recovery strategy' to 'fail task and continue workflow' and implemented our own solution to create a 'restart token file' from scratch every time the workflow starts. Only slightly off topic: The way we implemented this was with an extra table (we call it a SYNC table) containing one row only. That row is being updated every 10 minutes on the source by very a reliable scheduled process (a small CICS program). The content of this table is written to the target database once per workflow and an extra column is added in the mapping, that contains the content of $$PMWorkflowName. Apart from the workflowname column, the two DTL__Restart1 and *2 columns is written to the target as well. During startup of the workflow we run a small reusable session before the actual CDC session which reads the record for the current workflow from the SYNC table on the target side and creates the RESTART file from scratch. [please note that you will end up with dublicates from up to 10 minutes (from workflow start time) in the target. We accept that and are aggregating it away in all mappings reading from these]

    Try to tinker with combinations of these and tell what you experience. We now have a maximum throughput in a 10 minute interval of 10-100 million rows per mapping. Our target is Netezza (aka PDA from IBM)

    One more thing I can tell you: Every time a commit is triggered (each 86 seconds with the above settings) power center will empty all its writer buffers against all of the tables in one big commit scope. If either of these is locked by another process, you may end up with a lot of cascaded locking on the writer side, which will make the CDC seem slow.