I have a single input stream from a csv file. This stream is from a User Table, where the user may be duplicated with different ID's.
I need to clean this stream. The combo of [Name + Surname] always is unique, but I need to make a comparison before selecting one, because I want to keep the information of the most recent and create a new field based on the comparison of other value.
I have a Timestamp that will be used for selecting the row.
This is a example of input stream:
And this is a example of the output I want: What tool would be the best for this task?
Thanks in advance.
This should work with a bit of sorting and grouping in pentaho kettle.