Search code examples
azure-data-factoryazure-data-lake-gen2

Extract the very first last modified date from Azure Data Lake topic using Azure Data factory


Is there any method available in Azure Data Factory to get the very first last modified date from Azure Data Lake. Filename can be anything. I need the last modified date of a very first file uploaded in the data lake topic.

For Eg:

+----------+------------------+
| Filename | LastModifiedDate |
+----------+------------------+
| File1    | 2021-10-01       |
| File2    | 2021-10-02       |
| File1    | 2021-10-03       |
+----------+------------------+

Expected output: 2021-10-01

Any help would be appreciated. Regards, Sandeep


Solution

  • You could go through each folder in the datalake with Get-Metadata Activity like is done in this archived question on MSFT Forum.

    Depending on the number of folders and files, it is a rather brute force way of retrieving the earliest date of any file in your datalake.

    I found it easier to use PowerShell;

    $storageAccount = 'storageAccountName';
    $resourceGroupName = 'resourceGroupName';
    $storageAccountKey = (Get-AzStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccount | Select-Object -Property Value -First 1).Value
    $context = New-AzStorageContext -StorageAccountName $storageAccount -StorageAccountKey $storageAccountKey
    $allblobs = Get-AzStorageBlob -Container $containername -Context $context 
    $allblobs | Sort-Object -Property LastModified | Select-Object -Property Name,LastModified -First 1
    

    This PowerShell script returns the Name and LastModified datetime of the file with the earliest LastModified value. However, running a PowerShell script directly using ADF is not so straight forward. Here is an article by Bob Blackburn on how to achieve this.