Search code examples
t-sqlazure-synapse

CETAS TSQL query is not generating csv files with headers


I am writing a CETAS(create external table as select) query in TSQL using synapse. As all these CETAS queries generate a .csv file in a location (in storage account/data lake gen2), none of these files have headers.

Is there a possible solution or work-around using tsql or synapse analytics?

The following query is from azure documentation, this is also missing headers

-- use CETAS to export select statement with OPENROWSET result to  storage
CREATE EXTERNAL TABLE population_by_year_state
WITH (
    LOCATION = 'aggregated_data/',
    DATA_SOURCE = population_ds,  
    FILE_FORMAT = census_file_format
)  
AS
SELECT decennialTime, stateName, SUM(population) AS population
FROM
    OPENROWSET(BULK 'https://azureopendatastorage.blob.core.windows.net/censusdatacontainer/release/us_popula 
 tion_county/year=*/*.parquet',
    FORMAT='PARQUET') AS [r]
GROUP BY decennialTime, stateName
GO

-- you can query the newly created external table
SELECT * FROM population_by_year_state

In my case I have a JSON and other also result set of previously created external tables.


Solution

  • How is your file format defined? Here is the code we use that generates column headers:

    DROP EXTERNAL FILE FORMAT EXT_File_Format_CSV
    
    CREATE EXTERNAL FILE FORMAT EXT_File_Format_CSV
    WITH (  
        FORMAT_TYPE = DELIMITEDTEXT,
        FORMAT_OPTIONS (
            FIELD_TERMINATOR = ',',
            STRING_DELIMITER = '"',
            PARSER_VERSION = '2.0',
            FIRST_ROW = 2
        )
    );