Is there any method available in Azure Data Factory to get the very first last modified date from Azure Data Lake. Filename can be anything. I need the last modified date of a very first file uploaded in the data lake topic.
For Eg:
+----------+------------------+
| Filename | LastModifiedDate |
+----------+------------------+
| File1 | 2021-10-01 |
| File2 | 2021-10-02 |
| File1 | 2021-10-03 |
+----------+------------------+
Expected output: 2021-10-01
Any help would be appreciated. Regards, Sandeep
You could go through each folder in the datalake with Get-Metadata Activity
like is done in this archived question on MSFT Forum.
Depending on the number of folders and files, it is a rather brute force way of retrieving the earliest date of any file in your datalake.
I found it easier to use PowerShell;
$storageAccount = 'storageAccountName';
$resourceGroupName = 'resourceGroupName';
$storageAccountKey = (Get-AzStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccount | Select-Object -Property Value -First 1).Value
$context = New-AzStorageContext -StorageAccountName $storageAccount -StorageAccountKey $storageAccountKey
$allblobs = Get-AzStorageBlob -Container $containername -Context $context
$allblobs | Sort-Object -Property LastModified | Select-Object -Property Name,LastModified -First 1
This PowerShell script returns the Name and LastModified datetime of the file with the earliest LastModified value. However, running a PowerShell script directly using ADF is not so straight forward. Here is an article by Bob Blackburn on how to achieve this.