Search code examples
etlazure-data-factory

data factory special character in column headers


I have a file I am reading into a blob via datafactory. Its formatted in excel. Some of the column headers have special characters and spaces which isn't good if want to take it to csv or parquet and then SQL. Is there a way to correct this in the pipeline? Example "Activations in last 15 seconds high+Low" "first entry speed (serial T/a)"

Thanks


Solution

  • Normally, Data Flow can handle this for you by adding a Select transformation with a Rule:

    enter image description here

    1. Uncheck "Auto mapping".
    2. Click "+ Add mapping"
    3. For the column name, enter "true()" to process all columns.
    4. Enter an appropriate expression to rename the columns. This example uses regular expressions to remove any character that is not a letter.

    SPECIAL CASE

    There may be an issue with this is the column name contains forward slashes ("/"). I accidentally came across this in my testing: enter image description here

    Every one of the columns not mapped contains forward slashes. Unfortunately, I cannot explain why this would be the case as Data Flow is clearly aware of the column name. It can be addressed manually by adding a Fixed rule for EACH offending column, which is obviously less than ideal: enter image description here

    ANOTHER OPTION

    The other thing you could try is to pre-process the text file with another Data Flow using a Source dataset that has no delimiters. This would give you the contents of each row as a single column. If you could get a handle on the just first row, you could remove the special characters.