Search code examples
testingpentaho

capture executed sql from input table in pentaho pdi


I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime. I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks


Solution

  • Maybe the following approach will help:

    • We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.

    enter image description here

    • The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.

    enter image description here

    • In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.

    enter image description here

    • The last thing we need is a job calling the three transformations sequentially.

    enter image description here

    This is a sample output:

    sql_statement;min_time;max_time
    SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144