Search code examples
pentahokettle

looping in a Kettle transformation


I want to repetitively execute an SQL query looking like this:

SELECT '${date.i}' AS d, 
  COUNT(DISTINCT xid) AS n
FROM table 
WHERE date 
  BETWEEN DATE_SUB('${date.i}', INTERVAL 6 DAY) 
    AND '${date.i}'
;

It is basically a grouping by time spans, just that those are intersecting, which prevents usage of GROUP BY.

That is why I want to execute the query repetitively for every day in a certain time span. But I am not sure how I should implement the loop. What solution would you suggest?

The Kettle variable date.i is initialized from a global variable. The transformation is just one of several in the same transformation bundle. The "stop trafo" would be implemented maybe implicitely by just not reentering the loop.

Here's the flow chart:

enter image description here


Solution

  • Flow of the transformation:

    enter image description here

    In step "INPUT" I create a result set with three identical fields keeping the dates from ${date.from} until ${date.until} (Kettle variables). (for details on this technique check out my article on it - Generating virtual tables for JOIN operations in MySQL).

    In step "SELECT" I set the data source to be used ("INPUT") and that I want "SELECT" to be executed for every row in the served result set. Because Kettle maps parameters 1 on 1 by a faceless question-mark I have to serve three times the same paramter - for each usage.

    The "text file output" finally outputs the result in a generical fashion. Just a filename has to be set.

    Content of the resulting text output for 2013-01-01 until 2013-01-05:

    d;n
    2013/01/01 00:00:00.000;3038
    2013/01/02 00:00:00.000;2405
    2013/01/03 00:00:00.000;2055
    2013/01/04 00:00:00.000;2796
    2013/01/05 00:00:00.000;2687
    

    I am not sure if this is the slickest solution but it does the trick.


    enter image description here