I search many and I notice there are most of the way is used the job or subjob to implement the loop.
I think is it a waste system source? it is a good way?
I write code and know loop grammar in the programmer like 'while', 'for', 'foreach' and some specific iterator, due to some reason I need use the pentaho kettle ETL tool to finish my job, and I notice kettle provided the scripting tool -- javascript let developer write the javascript code or java code.
I think if we should use javascript step rather than job or subjob to implement the loop function? Because I just need iterate a samll data stream in most cases, and is there have another ways to simple implement the loop function?
Why Kettle not implement some step like 'iterator'? Is it a possible to implement a iterator via kettle developer API?
Thanks in advance.
Pentaho Data Integration uses a flow-based design, meaning you define what happens to each record in a stream that goes through the transformation or job. In most cases this replaces your basic for/while loop already, with the added bonus of a high degree of parallelization, as all steps in a transformation run simultaneously.
Operations that affect whole set of records such as grouping, sorting, aggregating are supported in single steps so again you never actually see the loop, it's implicit.
Rarely, you will need a loop in a Javascript step for combining an unknown number of fields, parsing invalid JSON/XML that the default steps choke on or working with other dynamic structures.
Jobs and subjobs are for control flow and reusability of components. They let you specify which transformations to run in which order under which conditions. You can implement loops in them, but it is often better to group your data instead and pass it to a subjob or transformation in batches.
My experience is that if your first solution involves a loop, you don't understand the flow-based options well enough yet. Often drawing out a flow-chart splitting out all the cases gives you a fair idea of what the transformation is going to look like in Spoon.
If you add an example to your question of a place you want to use a loop, perhaps I can show how to implement the same without a loop.
My answer to this other question is an example of a Javascript step used to construct a JSON object iteratively. You'll notice that it does its job without me writing any loop syntax, because the JS step itself already runs for each row that passes through it.