Search code examples
sql-serverazure-synapsepolybase

Polybase - Error converting data type VARCHAR to DATETIME


I'm trying to create an external table to a CSV file stored in Azure Storage.

The CSV data looks like this :-

Date Rail Period Calendar Year Calendar Month Calendar Month Name Fiscal Year Fiscal Period Weekday Weekday Number
26/04/2021 2201 2021 4 April 2022 Period 1 Monday 1
27/04/2021 2201 2021 4 April 2022 Period 1 Tuesday 2
28/04/2021 2201 2021 4 April 2022 Period 1 Wednesday 3
29/04/2021 2201 2021 4 April 2022 Period 1 Thursday 4
30/04/2021 2201 2021 4 April 2022 Period 1 Friday 5
01/05/2021 2201 2021 5 May 2022 Period 2 Saturday 6
02/05/2021 2202 2021 5 May 2022 Period 2 Sunday 7
03/05/2021 2202 2021 5 May 2022 Period 2 Monday 1
04/05/2021 2202 2021 5 May 2022 Period 2 Tuesday 2

I've created the External File Format using the following code

CREATE EXTERNAL FILE FORMAT csvFile
WITH (
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS (
      FIELD_TERMINATOR = ',',
      STRING_DELIMITER = '"',
      FIRST_ROW = 2,
      USE_TYPE_DEFAULT = TRUE,
      ENCODING = 'UTF8' )
);

and the External table as follows

CREATE EXTERNAL TABLE ext.DateDimension(
    [Date]                DATE, 
    [Rail Period]         INT, 
    [Calendar Year]       INT, 
    [Calendar Month]      INT,
    [Calendar Month Name] VARCHAR(9),
    [Fiscal Year]         INT, 
    [Fiscal Period]       VARCHAR(9), 
    [Weekday]             VARCHAR(9), 
    [Weekday Number]      INT)
WITH(
    DATA_SOURCE = [tfwpbstore_ADLSG2], 
    LOCATION = '/Generic Datasets/Date Dimension.csv',
    FILE_FORMAT = csvFile);

However, when I try to SELECT from the external table it gives me the following error

Msg 107090, Level 16, State 1, Line 1 HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: Error converting data type VARCHAR to DATETIME.

and I'm not quite sure what's wrong. If anyone could help, I'd be most grateful.


Solution

  • As mentioned in the comments above I needed to define the date format used in the file format statement as follows:

    CREATE EXTERNAL FILE FORMAT csvFile_ddMMyyyy_fr2
    WITH (
        FORMAT_TYPE = DELIMITEDTEXT,
        FORMAT_OPTIONS (
          FIELD_TERMINATOR = ',',
          STRING_DELIMITER = '"',
         DATE_FORMAT = 'dd/MM/yyyy',
          FIRST_ROW = 2,
          USE_TYPE_DEFAULT = TRUE,
          ENCODING = 'UTF8' )
    );