Search code examples
sqlsql-serverazure-sql-databaseparquetazure-synapse

Error handling external file: 'Inserting value to batch for column type DATE failed. Invalid argument provided.'


I am reading parquet file and I found the two columns that cause this problem:

SELECT
TOP 100 
POST_SEARCH_DATE,
SEARCH_DATE 

FROM
OPENROWSET(
    BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
    FORMAT = 'PARQUET'
    
) AS [result]

These columns contain values that are undefined and I have tried the following to get rid of the error, yet it persists.

SELECT
TOP 100 
POST_SEARCH_DATE,
SEARCH_DATE

FROM
OPENROWSET(
    BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
    FORMAT = 'PARQUET'
    
) AS [result]
WHERE cast(POST_SEARCH_DEPARTURE_DATE as varchar(100))!= 'undefined'
and cast (SEARCH_DEPARTURE_DATE as varchar(100)) != 'undefined'

The code from above will throw the same:

Error handling external file: 'Inserting value to batch for column type DATE failed. Invalid argument provided.'.

I also tried the following for one column:

SELECT
TOP 100 
POST_SEARCH_DATE,
SEARCH_DATE

FROM
OPENROWSET(
    BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
    FORMAT = 'PARQUET'
    
) with (POST_SEARCH_DATEVARCHAR(60)) AS [result]

But this throws the following:

Error handling external file: 'Converting value to batch for column POST_SEARCH_DEPARTURE_DATE failed. Invalid argument provided.'

As instructed, no rows cannot be removed from this raw file and I am unsure how to solve this problem. Any suggestion is welcome.


Solution

  • The solution is to use the with clause and define the data type that you expect to receive when you perform the open row set ( see below ):

    SELECT
      TOP 100 
      POST_SEARCH_DATE,
      SEARCH_DATE
    
    FROM
       OPENROWSET(
          BULK 'https://test/refined-parquet/data/v1.0/loaddt=2021-01-01/**',
          FORMAT = 'PARQUET'
         WITH (
            [SEARCH_DEPARTURE_DATE] VARCHAR(100) COLLATE Latin1_General_BIN2 3,
            [POST_SEARCH_DEPARTURE_DATE] VARCHAR(100) COLLATE Latin1_General_BIN2 2
    ) AS [result]