Search code examples
etltalendtalend-mdm

An issue of partial insertion of data into the target when job fails


We have 17 records data set in one of the source tables in which we have erroneous data in the 14th record, which causes the job failure. Then, in the target only 10 records would be inserted as the commit size given as “10” in the mysqloutput component and the job failed. In the next execution after correcting the error record, job will fetch all the 17 records with successful execution. Due to which there will be duplicates in the target.

we tried :

To overcome this, we have tried with tmysqlrollback component in which we have included the tmysqlconnection and tmysqlcommit components.

Q1 : Is there any other option to use tmysqlrollback without using the tmysqlconnection and tmysqlcommit components?

Explored the tmysqlrollback and commit component from the documentation

https://help.talend.com/reader/QgrwjIQJDI2TJ1pa2caRQA/7cjWwNfCqPnCvCSyETEpIQ

But still looking for clue how to design the above process efficient manner.

Q2 : Also, We'd like to know about the RAM usage and disk space consumption from the performance perspective.

Any help on it would be much appreciated ?


Solution

    1. No, the only way to do transactions in Talend is to open a connection using tMysqlConnection, then either commit using a tMysqlCommit or rollback using tMysqlRollback.
    2. Without knowing what you're doing in your job (lookups, transformations..etc), it's hard to advise you on the ram consumption and performance. But if you only have a source to target, then ram consumption should be minimal (make sure you enable stream on the tMysqlInput component). If you have another database as your source, then ram consumption depends on how that database driver is configured (jdbc drivers usually accept a parameter to tell it to only fetch a certain number of records at a time).
      Lookups and components that process data in memory (tSortRow, tUniqRow, tAggregateRow..etc) are what causes memory issues, but it's possible to tweak their usage (using disk among other methods).