Search code examples
pentahoamazon-redshiftinsert-update

Pentaho update/insert


I am trying to have a setup in Pentaho where :

  1. My source data is in MySQL DB and target database is Amazon redshift.

  2. I want to have incremental loads on Redshift database table, based on the last updated timestamp from MySQL DB table.

Primary key is student ID.

Can I implement this using update/insert in Pentaho ?


Solution

  • Insert/Update step in Pentaho Data Integration serves the purpose of inserting the row if it doesn't exist in the destination table or updating it if it's already there. It has nothing to do with incremental loads, but if your loads should be inserting or updating the record based on some Change Data Capture mechanism then this is the right step at the end of the process.

    For example you could go one of two ways:

    • If you have a CDC then limit the data at Table Input for MySQL since you already know the last time a record has been modified (last load)
    • If you don't have a CDC and you are comparing entire tables then go for joining the sets to produce rows that has changed and then perform a load (slower solution)