Search code examples
duplicatesmuleesbdataweave

Mulesoft: Remove duplicate records by checking dateField using dataweave


I have a csv file with following data:

Id,Name,Type,date
1,name1,employee,25/04/2017
2,name2,contrator,26/04/2017
3,name3,employee,25/04/2017
4,name4,contrator,26/04/2017
5,name5,employee,24/04/2017
6,name6,contrator,24/04/2017
7,name7,employee,25/04/2017
8,name8,contrator,24/04/2017
9,name9,employee,24/04/2017
10,name10,contrator,26/04/2017
6,name6,employee,27/04/2017
11,name11,employee,27/04/2017
12,name12,contrator,27/04/2017

If It has two rows with same Id number. One of the row should be removed by checking the latest date. The row with older date should be removed. For example, above input has two rows of data with ID no 6. The row with date 24/04/2017 should be removed. The output should be like this

Id,Name,Type,date
1,name1,employee,25/04/2017
2,name2,contrator,26/04/2017
3,name3,employee,25/04/2017
4,name4,contrator,26/04/2017
5,name5,employee,24/04/2017
6,name6,employee,27/04/2017
7,name7,employee,25/04/2017
8,name8,contrator,24/04/2017
9,name9,employee,24/04/2017
10,name10,contrator,26/04/2017
11,name11,employee,27/04/2017
12,name12,contrator,27/04/2017

I need to achieve this using Dataweave. Please provide me a solution or suggestions


Solution

  • here is the dataweave you are looking for:

    %dw 1.0
    %output application/csv
    %var toDate = (str) -> str as :date { format: "dd/MM/yyyy" }
    %var maxDate = (a, b) -> a when toDate(a.date) > toDate(b.date) otherwise b
    ---
    payload groupBy $.Id 
        pluck $ map ($ reduce ((val, acc) -> maxDate(val, acc)))
    

    enter image description here