Search code examples
pentahokettledata-integration

PDI Kettle/Spoon Table to foreign key matching


I have a sources table that has ID and Source(varchar)

1 Facebook
2 Twitter
3 Google

I have incoming data that has Source(varchar) and Views(Int)

Facebook 10
Twitter 12
Reddit 14

I want the kettle job to do this:

  1. Check if the source exists in the source table, and if so replace add a field of type INT called sourceID with the respective ID from source
  2. If it doesn't exist add it to the source table.

E.g. from the above data the result should be this

sourceID,Views
1,10
2,12
4,14 (Reddit wasn't in the table so it created it and the autoincrement gives it ID 4).

I am having trouble finding the right steps to achieve this


Solution

  • My answer uses two transformations: in the first steps an intermediate result is created by joining the sources and the keys using the match key source. Also the maximum source id is determined and stored in a variable. Note the two sort elements and the merge element can be replaced by a single "merge in memory" element.

    First transformation: merge sources and views

    In the second transformation the intermediate result is read and depending on whether there is a existing source either an update is performed on existing data or a new entry is inserted in to the sources table. The source id of the new entry is computed as a sum of the maximum id retrieved from the variable and a sequence starting in one.

    Second transformation: insert and update