Search code examples
pentahokettlepentaho-spoonpentaho-data-integrationpdi

Pentaho Data Integration - Merge rows (diff) step not executing with Table Input "Replace variables in script"


I have a PDI Transformation that is using the Merge rows (diff) step comparing rows from two Table Input steps (with Sorts in between). (I linked a screenshot of the Transformation at the bottom of the post)

The queries in the Table Input require parameters to be passed, so the options Replace variables in script?, Insert data from step, and Execute for each row? are all active / filled.
With these options disabled / empty and a value hardcoded where the parameter is now defined, the Merge rows (diff) is working properly and marking the rows with the correct state (new, changed, ...).
On the other hand, if I try to run this Transformation with the options active, passing a parameter, the Merge rows (diff) seems to not be executed - there is a Write to log step after it and it is not printing anything, opposed to when the options are disabled and the "parameter" is hardcoded.

Has anyone faced a similar issue? Is the Merge rows (diff) step not implemented to work in this scenario?

Thank you in advance to everyone that might have an idea or suggestion.
Cheers

Screenshot of the simplified PDI Transformation.

Update #1: Add screenshot of Step Metrics as requested in the comments. For the entity that was tested, Compare Rows returns 4 rows, while Reference Rows does not - this had the desired result of having 4 rows marked as new by the Merge rows (diff) step. I will also duplicate this entity to the Reference Rows table and upload a new screenshot with Step Metrics where both Table Input queries return the same rows.

enter image description here

Update #2: Add seconds screenshot of Step Metrics - where both Table Input steps output rows (and the same rows). Merge rows (diff) still does not output anything. Uploading this screenshot I noticed an issue - the Sort steps were not returning rows, so I replaced them with an ORDER BY clause in the SQL queries in the Table Input steps, added the Write to log steps, and now the Merge rows (diff) step is getting and outputting rows - even though it should be outputting 4 as identical instead of the 3 it does, but that is outside the scope of this post.

enter image description here enter image description here


Solution

  • The "fix" was to remove the Sort row steps and replace the ordering by an ORDER BY in the SQL query of the Table Input steps.