Search code examples
talend

How to remove duplication with Talend Data Preparation?


I would like to remove duplication with my Talend Data Preparation and I have a column named: HOURS, I want to calculate those hours between them and remove the email and names duplication, here is an example of my table :

enter image description here

As you can see I have a lot of user_name and email is the same, but my hours are not same, I want to add my hours together depending on the user_name and email and remove any duplication of my user_name and email at the same time.


Solution

  • (I am not really into Data Prep, so perhaps there is an inside solution that I don't know of).

    I think you can't have a GROUP BY with a SUM operation in Talend Data Preparation, as the tool is only able to correct lines of data, and can't make aggregation operations.

    You'll be able to sum your data with a tAggregateRow in Talend Data Integration, after exporting your corrected data from Data Prep.