Search code examples
pentahokettledata-integrationpentaho-data-integration

Delete lines based on two tables


I have one problem, where i have one main table one other table. I have in one table (Accidents), attribute named Accident_ID (Values goes like this 1,4,7,10 and so on)

In other table named casualties, i have the same attribute Accident_ID (But values here goes like this 1,2,2, 3,4,4,4,5,6,7,8, 9, 10, 11 and so on). => There are several same ID-s because each Accident can have more Casualties.

The question is how can i delete in table casualties lines, which don't have same ID-s as in table Accidents (So i want to delete lines in table Casualties with ID-s 2,3,5,6,8,9,11 etc) => i have like 100k ID-s so i can't do manually.

Thank you for help.


Solution

  • Use the Merge Diff step. It compares two streams (or tables) and tell the rows which are added, deleted, modified or identical. Then Filter out the rows you do not want.