Search code examples
postgresqlpentahopentaho-data-integration

Pentaho Data Integration 9.3 - Postgres Incremental load


i m using Pentaho Data Integration for loading incremental data from Production Postgres to Staging Postgres DB. For doing so i m getting max timestamp from staging DB and querying production DB for getting records with timestamp greater then the max value.

For getting max timestamp i m using table input and for second query to get records from production DB i m using database join (I tried table input also) where date variable will be replaced from the previous step value.

I have tried giving limit of 100k records when selecting from production DB and that got completed but when doing without limit step is not getting completed even after 90+ hours and also not giving any error.

Doing the count query is giving result in 15-20 mins. Number of records to be pushed is more than 7 million for now. Once this 7 million records are loaded this job will be running every hour.

For loading data to staging postgres DB i m using bulk Postgres data loader.


Solution

  • I have found the solution, it was a connectivity issue, the connection was getting timed out. Adding KeepAlives = 1 to the connection panel fixed the problem.