Search code examples
azureazure-data-factoryazure-data-lakedelta-lake

How can I transition from Azure Data Lake, with data partitioned by date folders into delta lake


I own an azure data lake gen2 with data partitioned by datetime nested folders.

I want to provide delta lake format to my team but I am not sure if I should create a new storage account an copy the data into delta format or if it would be best practice to transform the current azure data lake into a delta lake format.

Could anyone provide any tips on this matter?


Solution

  • AFAIK, Delta format is supported only as inline dataset and only in Data flows, we can have inline datasets.

    So, my suggestion is to use Data flows for this.
    As you have the data in date time nested folders, I reproduced with sample dates like below. I have uploaded a sample csv file in each folder 10 and 9.

    enter image description here

    Create a data flow in ADF and in source select inline dataset to give the wild card path we want. Select your data format, here Delimited text for me. give the linked service as well.

    enter image description here

    Assuming that your nested folder structure is same for all files, give the wild card path like below as per your path level.

    enter image description here

    Now, create delta format sink like below.

    enter image description here

    give the linked service as well.
    In the sink settings give the folder for your delta files and Update method.

    enter image description here

    You can see the delta format files were created in the Folder path after execution.

    enter image description here