I would like to remove duplication with my Talend Data Preparation and I have a column named: HOURS, I want to calculate those hours between them and remove the email and names duplication, here is an example of my table :
As you can see I have a lot of user_name and email is the same, but my hours are not same, I want to add my hours together depending on the user_name and email and remove any duplication of my user_name and email at the same time.
(I am not really into Data Prep, so perhaps there is an inside solution that I don't know of).
I think you can't have a GROUP BY with a SUM operation in Talend Data Preparation, as the tool is only able to correct lines of data, and can't make aggregation operations.
You'll be able to sum your data with a tAggregateRow in Talend Data Integration, after exporting your corrected data from Data Prep.