Search code examples
pentahodata-integration

How pentaho's copy block handles duplicate data?


I a trying to copy my MySQL DB to HANA DB through Pentaho's Copy Table Wizard. It automatically created a workflow but I am confused if the destination DB is partially filled, then would it handle duplicate rows or it just copy them anyways?


Solution

  • The Copy table wizard doesn't analyze duplicates. It just attempts to run a bunch of insert statements into the destination DB. It's your job as an ETL developer to ensure duplicate data is filtered out or alternatively, is updated on the target DB. Check the Insert/Update step. Performance will of course be much lower, as each row will first do a database lookup and depending on the lookup results either an insert or an update is issued.