Search code examples
csvencodingazure-data-factory

Check the CSV file encoding in Data Factory


I am implementing a pipeline to move csv files from one folder to another in a data lake with the condition that the CSV file is encoded in UTF8.

Is it possible to check the encoding of a csv file directly in data factory/data flow?

Actually, the encoding is set in the connection conditions of the dataset. What happens in this case, if the encoding of the csv file is different?

What happens at the database level if the csv file is staged with a wrong encoding?

Thank you in advance.


Solution

  • Just for now, we can't check the file encoding in Data Factory/Data Flow directly. We must per-set the encoding type to read/write test files: enter image description here

    Ref: https://learn.microsoft.com/en-us/azure/data-factory/format-delimited-text#dataset-properties

    The Data Factory default file encoding is UTF-8.

    Like @wBob said, you need to achieve the encoding check in code level, like Azure Function or Notebook and so on. Call these actives in pipeline.

    HTH.