Search code examples
pentahokettlepdipentaho-spoon

Running multiple Kettle transformation on single JVM


We want to use pan.sh to execute multiple kettle transformations. After exploring the script I found that it internally calls spoon.sh script which runs in PDI. Now the problem is every time a new transformation starts it create a separate JVM for its executions(invoked via a .bat file), however I want to group them to use single JVM to overcome memory constraints that the multiple JVM are putting on the batch server.

Could somebody guide me on how can I achieve this or share the documentation/resources with me.

Thanks for the good work.


Solution

  • Use Carte. This is exactly what this is for. You can startup a server (on the local box if you like) and then submit your jobs to it. One JVM, one heap, shared resource.

    Benefit of that is then scalability, so when your box becomes too busy just add another one, also using carte and start sending some of the jobs to that other server.

    There's an old but still current blog here:

    http://diethardsteiner.blogspot.co.uk/2011/01/pentaho-data-integration-remote.html

    As well as doco on the pentaho website.

    Starting the server is as simple as:

    carte.sh <hostname> <port>
    

    There is also a status page, which you can use to query your carte servers, so if you have a cluster of servers, you can pick a quiet one to send your job to.