Search code examples
azure-synapseparallel-data-warehouse

Azure Data Warehouse PolyBase File format


We have a file that looks like this:

Col1,Col2,Col3,Col4,Col5
"Hello,",I,",am",some,data!

It therefore has the following 'properties':

  • Comma-separated
  • Double-quote column delimiter
  • Commas in some of the columns

Now, I am not sure if it's actually possible to ingest this with PolyBase, but wondered if there was a way?

The error we are seeing at present is "Could not find a delimiter after quote".. which i guess is because after the double quote it is hitting what is an expected delimiter..

Here is our current file format, for completeness:

CREATE EXTERNAL FILE FORMAT Comma
WITH (FORMAT_TYPE = DELIMITEDTEXT,
      FORMAT_OPTIONS(
          FIELD_TERMINATOR = ',',
          STRING_DELIMITER = '"',
    )
)

Solution

  • Sorted this out in the end by adding an intermediary step to convert the file from csv to ORC format..

    It's a bit clunky (as it leaves a mess of a copy behind), but the PolyBase then does work with the fileformat:

    CREATE EXTERNAL FILE FORMAT Orc
    WITH (FORMAT_TYPE = ORC)
    

    works for now, until it is addressed by the product team: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-field-row-terminators-within-strin