ADF/Data Flow-- I want to sort my incoming data based on the date column. My requirement is to take only latest date data(previous date data) for the ID, and ID can be duplicate with different dates. I have tries sorting based on both id and date column but it is not working for all the records. Can someone please help me to get this sorted in Data flow of Azure data factory!
I need to pick only those IDs which are having latest date from a Date column.Also how can I extract only day from date type column? example '23-05-2023' I want on 23 from this date column.
To pick only the latest date data for each ID, you can use the Aggregate
transformation and group by ID
column and use the Max
function to get the maximum date value for each group. To extract only the day from a date type column, you can use the Derived Column
transformation and use the dayOfMonth()
function to extract the day value from the date column. Below are the steps.
group by: id
aggregates: date = max(date)
This groups the data by the ID
column, and gets the maximum Date
value for each group.
Day
and give the expression as split(toString(date),'-')[3]
or dayOfMonth(date)
to extract the day value from date
column
Derived column Transformation -Output:
Dataflow expression
source(output(
id as short,
date as date
),
allowSchemaDrift: true,
validateSchema: false,
ignoreNoFilesFound: false,
dateFormats: ['dd-MM-yyyy']) ~> source1
source1 aggregate(groupBy(id),
date = max(date)) ~> aggregate1
aggregate1 derive(Day = split(toString(date),'-')[3],
column1 = dayOfMonth(date)) ~> derivedColumn1