Search code examples
etlpentahokettle

How flexible is Pentaho for dynamic transformations? (user-input based parameters)


Based on the following use case, how flexible are pentaho tools to accomplish a dynamic transformation?

  1. The user needs to make a first choice from a catalog. (using a web interface)

  2. Based on the previously selected item, the user has to select from another catalog (this second catalog must be filtered based on the first selection).

steps 1 and 2 may repeat in some cases, (i.e. more than two dynamic and dependent parameters).

  1. From what the user chose in step 1 and 2, the ETL has to extract information from a database. The tables to select data from will depend on what the user chose in previous steps. Most of the tables have a similar structure but different name based on the selected item. Some tables have different structure and the user have to be able to select the fields in step 2, again based on the selection of step 1.

  2. All the selections made by the user should be able to be saved, so the user doesn't have to repeat the selection in the future, only re-run the process to get updated information based on the pre-selected filters. However he/she must be able to make a different selection and save it for further use if he/she wants different parameters.

Is there any web-based tool to allow the user to make all this choices based? I made the whole process using kettle but not dynamically, since all the parameters need to be passed when running the process in the console. The thing is, the end user doesn't know all the parameter values unless you show them and let them chose, and some parameters depend on a previous selection. When testing I can use my test-case scenario parameters, so I have no problem, but in production there is no way to know in advance what combination the user will chose.

I found a similar question, but it doesn't seem to require user input between transformation steps.

I'd appreciate any comments about the capabilities of Pentaho tools to accomplish the aforementioned use case.


Solution

  • I would disagree with the other answer here. If you use CDE it is possible to build a front end that will easily do those prompts you suggest. And the beauty of CDE is that a transformation can be a native data source via the CDA data access layer. In this environment kettle is barely any slower than executing the query directly.

    The key thing with PDI performance is to avoid starting the JVM again and again - when running in a web app you're already going so performance will be good.

    Also; The latest release of PDI5 will have the "light jdbc" driver (EE customers) which is basically a SQL interface on PDI jobs. So that again shows that PDI is much more these days than just a "batch" etl process.