Search code examples
pentahopentaho-data-integration

How to run Pentaho transformations in parallel and limit executors count


The task is to run defined number of transformations (.ktr) in parallel. Each transformation opens it's own database connection to read data. But we have a limitation on given user, who has only 5 allowed parallel connection to DB and let's consider that this could not be changed. So when I start job depicted below, only 5 transformations finish their work successfully, and other 5 fails with db connection error.

enter image description here

I know that there is an option to redraw job scheme to have only 5 parallel sequences, but I don't like this approach, as it requires reimplementation when count of threads changes.

Is it possible to configure some kind of pool of executors, so Pentaho job will understand that even if there were 10 transformations provided, only random 5 could be processed in parallel?


Solution

  • The concept is the following:

    1. Catch database connection error during transformation run attempt
    2. Wait a couple of seconds
    3. Retry run of a transformation

    Look at attached transformation picture. It works for me.

    Disadvantages:

    • A lot of connection errors in the logs, which could confuse.
    • Given solution could turn in infinite loop (but could be amended to avoid it)

    enter image description here