Search code examples
mergepentahopentaho-spoon

Merge Join Pentaho Issue


I've got this question:

I've got two sources in A and B and a Merge Join Step (in INNER option). the image shows what I am facing.

enter image description here

I am getting the right identifiers but with the value of the lastest row repeted n times for each one of them.

I need to get all the identifiers from B that are present in A.

I know there are also these options: Database Join y Database Lookup, but they could be kind of slow given that I have a lot of data to check

What component should I use to the get the expected result in Pentaho.

Regards.


Solution

  • I could not replicate the issue.

    The more probable errors are

    1. the input flow are not sorted,
    2. the first step (master) and second step (follower) are switched
    3. the key are not correct (a click on a drop box happens quickly)

    Now, I think your goal it to filter out from B all the rows with an identifier not in A. I suggest to reverse the flow: for each row of B you lookup the identifier in A, and then filter out the identifier not found in A.

    As a general rule, prefer the LookUp step. It is super fast and nearer the human way of thinking than the SQL joins.

    If you need to grab more than one records for each input row, then use the Merge Join (and sort the input flows).

    Avoid if you can database Join and Lookup for performance reasons.