Search code examples
azure-data-factory

adf skip line count issue with csv having empty string


I am trying to copy csv files in blob storage to snowflake. first row as header and want to skip 2nd line in csv while using copy activity in azure data factory.

csv-file-rows:

ID,NAME,AGE,INCOME 
'','','',0 
123,'test',23,1045 

expected: first row as header, skip second row, insert remaining rows

Error code:

DELIMITEDTEXTCOLUMNNAMENOTALLOW NULL,EXCEPTION,Message=The name of column index 1 is empty


Solution

  • If you know the column names before the pipeline run, you can try the below approach using copy activity.

    For sample, I took the input csv data like below.

    ID,NAME,AGE,INCOME
    '','','',0
    123,test,23,1045
    124,test2,24,1046
    

    First uncheck the First row as header in the source csv dataset.

    enter image description here

    Now, give this to the copy activity source and give 2 in the Skip line count option.

    enter image description here

    Give your target dataset as copy activity sink and import the schemas in the mapping.

    enter image description here

    Here, give your required column names in the order.

    Now, run the copy activity and the 2nd line will be skipped in the target. For sample, I took the target as csv. In your case, it should be snowflake.

    enter image description here

    If you want to do this process dynamically without giving column names manually, you can use Dataflow. Give your input dataset as source in the dataflow. Make sure the First row as header of the dataset is checked in this case.

    Now, add a Surrogate Key transformation and add a column key with incrementing the value as 1.

    enter image description here

    Then, add a filter transformation with condition to skip the first row key!=1.

    enter image description here

    Add your snowflake dataset as sink here and execute the dataflow from the pipeline. It will give the desired results in the target.