Search code examples
etltalend

How to make groups in an input and select a specific row in each of them in Talend?


I am working on a Talend transformation process (we are using Talend 6.4). , and I don't know how to implement the current requirement.

I have an input consisting in :

  • Two columns that are my group keys (Account and Product), but are not unique (the same Account x Product couple can happen in multiple rows)
  • A criterion column (Contract end date), which will help me decide which row I want to keep for each group
  • Some "tail" data that need to be passed to the following step of the processing (the contract number)

The rule to implement is:

  • Keep only one record per group
  • The selected record must be one with no end date or, if all have end date, with the biggest end date
  • The selected record can be random in case there is a tie

See the transformation applying those rules on some dummy data: Description of the transformation

I thought first to do the following:

  1. sort by Account, Product, End_date (nulls first)
  2. "select first" in each group

but I am not skilled enough to know whether the second transformation exists in Talend.

description of this solution

Regards,

Pierre


Solution

  • Very interesting Talend question. You need to create something like this job.

    enter image description here enter image description here enter image description here

    here a link to the zip file to import in your Talend