Search code examples
pentahoetlkettlepentaho-spoonpentaho-data-integration

How should I control the duplication of calculation version using Pentaho?


I have a table result_slalom where data is populated via ETL Jobs of Pentaho. When ETL runs for the first time it creates version-1.

Now, if data is changed after new calculations it becomes version-2.

I need to make changes only in the Calculation Version -2 and no more than 2 versions should be there in the table result_slalom. ( Version-1 and Version-2 )

So the logic is : Check if data exists in table

o

When data exists and existing version is 1, then set the version of new data=2
--> Insert new dataset
o   When data exists and existing version is 2, then set the version of new data=2
--> Update existing dataset
o   When no data exists, then set version = 1
--> Insert new dataset

How do I make my Pentaho formula for this logic?

currently it is:

if([VersionInDB]=1;[Calculationversion];[VersionInDB]+1)

Solution

  • The dimension lookup/update is a step which does exactly that.

    In addition it has validity dates : at the time version2 is created, version1 receives a end-date of now and version2 receives a start-date of now. It make it easy to retrieve historical information with a where date between start-date and end-date. Plus, you have a single button that writes the create/alter table and create index for you.

    The other neat solution is to put a trigger on the tables.

    Forget to reinvent the wheel in that direction. Although I usually like to inventing the wheel, it is one case on which redeveloping the logic will lead you in an infinite number of tests and bugs.