Search code examples
pythonjdbcinformixinformatica

Informix connection speed - Informatica PowerCenter vs Python by JDBC


I have one workflow in Informatica Powercenter for transferring data between Informix and Oracle. Informatica needs about 20 seconds to do this work (6 tables, daily updates filtred by current date).

I tried to achieve the same in Python using JDBC... And this way it's extremely slow...

For example, the first of these tables - it's about 100000 rows/day - in Python fetching even 10000 rows takes about a minute.

Is it normal that Informatica is so much quicker? Can I somehow speed up my Python script?

example:

import jaydebeapi
conn = jaydebeapi.connect("com.informix.jdbc.IfxDriver",
                           "jdbc:informix-sqli://server:port/cms:INFORMIXSERVER=x;user=x;password=x",
                           ["chancel", "chancel"],
                           r"C:\app\informix-jdbc-complete-4.50.4.1.jar")

curs = conn.cursor()
curs.execute("select * from table")
curs.fetchall()

Solution

  • Multiple factors can be a reason for 'Why Informatica is faster' -

    1. Informatica can use native drivers like Oracle or Informix drivers to connect directly to the server. These are way faster than a JDBC driver.
    2. Informatica uses multi-threading to connect to the source, do the transformation, and load into the target, so it's faster than Python (which follows a procedural approach).
    3. Informatica is built for this kind of ETL, so it's an optimized tool for any ETL type operations - memory and, process optimized. And you can control memory parameters when the data size is bigger. It can also create indexes etc. to make loading faster.
    4. Python doesn't work well for larger table processing because it needs to hold all in memory and it also needs a powerful CPU.