azureazure-data-factorypartitioninghana

Partition 0CALDAY (yyyymmdd) column to year, month and day in a copy activity of Azure Data Factory


I am using SAP HANA connector to copy the data from SAP HANA to Azure data lake. In my query there is a column called 0CALDAY which is a string data type and has row values like 20230119. I would like to use it as a partition column. Simply putting partition column name in the dynamic range option works fine but what i would like to achieve is the partitions at year, month and day level in this case 2023/01/19.

I tried this

  1. @concat('/year=', formatDateTime(utcnow(), 'yyyy'), '/month=', formatDateTime(utcnow(), 'MM'), '/day=', formatDateTime(utcnow(), 'dd')) : [LIBODBCHDB DLL][HDBODBC] General error;260 invalid column name: /year=2023/month=09/day=14
  2. Replacing utcnow() with the 0CALDAY column but it is also not working rather throwing an error of invalid column name.

This link provides some basic information about partitioning but is also not helping to achieve my goal.

I will appreciate any help on this one! Thanks


Solution

  • Partition option in source settings of copy activity is used for parallel copying of data from source to sink. It does not partition the sink datastore. When huge data needs to be copied from source to sink, you can use partition option in source. This will enhance the performance of copying data. In order to partition data in sink, you can use dataflow activity instead of copy activity. But in dataflow activity, SAPHANA as a source is not available. Therefore, you can stage the full data from SAP HANA to blob storage using copy activity and then use dataflow activity to partition data in sink. In order to do this in dataflow, follow the below steps.

    • In dataflow activity, add the source transformation with the staging data as source dataset.
    • Then add the derived column transformation, and give the value for new column folder_name as concat('/year=',substring({0CALDAY},1,4),'/month=',substring({0CALDAY},5,2),'/day=',substring({0CALDAY},7,2))

    enter image description here

    • Then take the sink transformation with blob storage as sink dataset. In sink settings, give the file name option as Name folder as column data and give the column data as folder_name. Here folder_name is the column which is created in derived column transformation.

    enter image description here

    • Once the pipeline is run, data is partitioned in the required format. enter image description here