Search code examples
pentahopentaho-spoonpentaho-data-integration

Checking of replicated data Pentaho


I have about 100 tables to which we replicate data, e.g. from the Oracle database. I would like to quickly check that the data replicated to the tables in db2 is the same as in the source system. Does anyone have a way to do this? I can create 100 transformations, but that's monotonous and time consuming. I would prefer to process this in a loop. I thought I would keep the queries in a table and reach into it for records.

enter image description here

I read the data from Table input (sql_db2, sql_source, table_name) and write do copy rows to result. Next I read single record and I read a single record and put it into a loop.

enter image description here

But here came a problem because I don't know how to dynamically compare the data for the tables. Each table has different columns and here I have a problem.

enter image description here

I don't know if this is also possible?


Solution

  • You can inject metadata (in this case your metadata would be the column and table names) to a lot of steps in Pentaho, you create a transformation to collect the metadata to inject to another transformation that has only the steps and some basic information, but the bulk of the information of the columns affected by the different steps is in the transformation injecting the metadata.

    Check Pentaho official documentation about Metadata Injection (MDI) and the sample with a basic example of metadata injection available in your PDI installation.