Search code examples
javatalend

Talend is OnComponantOk Thread safe


i have a question about parallelism with Talend effectively i am building an ETL that deletes multiple files in parallel then updates a db table.

my jobs runs tFlowToIterate === iterate in parallel x10 ===> tDelete ===OnComponanantOK===> tDBRow

my tDBRow requires variables defined in the the tFlowToIterate it works! However i am unclear on the logic around why it works??? how does Talend ensure that a tDBrow has the appropriate value under this?

my theory is that the tDBRow on a OnCompnanatOk link is like a child the works under the tDelete iteration

can anyone explain how / why this works?


Solution

  • When talend starts a new thread it creates a copy of the globalMap, your values are in the globalMap, so each DBRow will have its own globalMap. (If you store a lot of data in globalMap this could result in higher memory usage.)

    What you need to be careful is the other direction, as the globalMap is write synchronized. So if you put a value to it inside the thread it will be written to the parent as well. In real life this means that if you try to increment a variable in globalMap your parallel thread will both see the value 0 and write back 1. So the 11th tread will start with 1.

    Thus if you want to avoid this, (e.g. put ( hadError, true) then check for hadError later, when a thread starts make sure you initialize those values that you want to depend on in your thread. Or if the logic is more complex then make that subjob a new job, so the globalMap can't become corrupted.