Search code examples
pentahokettlepdi

Is there any step or logic which will maintain SCD type 1 apart from Insert/Update step in pentaho?


Is there any step or logic which will maintain SCD type 1 apart from Insert/Update step.As for maintaining this for large data it is taking huge time. It is a simple step where max(update time) from table we are getting and the data after that update date is taken and simply inserted in the target table based on business key in Insert/Update. A detail explanation would be helpful


Solution

  • For pure dimension control you can use the Combination lookup/update or the Dimension Lookup/update (with update type = update or punch through, it behaves as a SCD type I).

    However, they will always have the same problem: round trips to the DB, costing you speed. They somewhat mitigate it by caching, but it may not be enough, depending on how many updates may be required.

    Another alternative is using a combination table output + update steps using the following trick:

    1. Create a unique index on the DB for the fields that are the keys of the SCD;
    2. Try inserting all records using a Table output step. Those rows that already exist in the DB will be rejected.
    3. Add a Update step after the table output, but instead of connecting it with a black hop, the "main output of step", connect it via the error handling red hop.

    Rows that don't exist on the DB will be inserted by the Table output without doing any lookups. those that fail (because they already exist), will be sent to the update step.