I have the following job that takes the row count of a sequential file. When running on multiple nodes for example 4 I am getting 4 different row counts. How would I go about outputting only one row with the row count.
Source --> transformer (this has a dummy row for counting) --> Aggregator Stage --> Sequential File
Any help would be greatly appreciated!! Thanks!
You could set the Execution mode to Sequential for the aggregator stage (Stage-Advanced tab). If the job isnt processing large volumes of data you may not experience much of a slow-down in performance. If you DO process large volumes then it makes more sense to leave the aggregator as Parallel, then add another aggregator and set THAT to sequential instead