Search code examples
azure-data-factoryderived-column

Azure Data Factory - dynamically deal with trailing column delimiters


I've got a blob container of tab delimited files that I'm trying to automate to import to am azure SQL database. One table per file Every source file has a different schema.

I've got a for each loop to loop through the files in the blob container, then the copy data task uses the file metadata to pass parameters into a generic source dataset and automatically create tables in the SQL database sink.

The process is fine - HOWEVER - the source files all have a trailing tab - which the datafactory is interpreting as a column without a name - so I get this error:

ErrorCode=DelimitedTextColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The name of column index 36 is empty. Make sure column name is properly specified in the header row.,Source=Microsoft.DataTransfer.Common,'

I think this partially answers my question: ADF Copy with trailing column delimiter

But how do I get it to work dynamically? The solution above manually specifies the 'valid' columns in the dataset to exclude the empty ones, but my 'valid' columns will change each time through the for each loop for each different source file, so I can't hardcode.

Cheers


Solution

  • You cannot achieve the desired output using copy activity.

    Instead use dataflow for this. In the source, dataflow will give the projection like below.

    enter image description here

    For extra trailing tab(\t), it will give the name like above.

    So, filter that extra column using select transformation.

    instr(name,'_c')==0
    

    enter image description here

    Result:

    enter image description here

    Add the SQL table as sink and it will copy the desired columns to the target table. Use dataset parameters for the source and sink to do this process dynamically.