I've got this question:
I've got two sources in A and B and a Merge Join Step (in INNER option). the image shows what I am facing.
I am getting the right identifiers but with the value of the lastest row repeted n times for each one of them.
I need to get all the identifiers from B that are present in A.
I know there are also these options: Database Join y Database Lookup, but they could be kind of slow given that I have a lot of data to check
What component should I use to the get the expected result in Pentaho.
Regards.
I could not replicate the issue.
The more probable errors are
Now, I think your goal it to filter out from B all the rows with an identifier not in A. I suggest to reverse the flow: for each row of B you lookup the identifier in A, and then filter out the identifier not found in A.
As a general rule, prefer the LookUp step. It is super fast and nearer the human way of thinking than the SQL joins.
If you need to grab more than one records for each input row, then use the Merge Join (and sort the input flows).
Avoid if you can database Join and Lookup for performance reasons.