I am reading parquet file and I found the two columns that cause this problem:
SELECT
TOP 100
POST_SEARCH_DATE,
SEARCH_DATE
FROM
OPENROWSET(
BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
FORMAT = 'PARQUET'
) AS [result]
These columns contain values that are undefined and I have tried the following to get rid of the error, yet it persists.
SELECT
TOP 100
POST_SEARCH_DATE,
SEARCH_DATE
FROM
OPENROWSET(
BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
FORMAT = 'PARQUET'
) AS [result]
WHERE cast(POST_SEARCH_DEPARTURE_DATE as varchar(100))!= 'undefined'
and cast (SEARCH_DEPARTURE_DATE as varchar(100)) != 'undefined'
The code from above will throw the same:
Error handling external file: 'Inserting value to batch for column type DATE failed. Invalid argument provided.'.
I also tried the following for one column:
SELECT
TOP 100
POST_SEARCH_DATE,
SEARCH_DATE
FROM
OPENROWSET(
BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
FORMAT = 'PARQUET'
) with (POST_SEARCH_DATEVARCHAR(60)) AS [result]
But this throws the following:
Error handling external file: 'Converting value to batch for column POST_SEARCH_DEPARTURE_DATE failed. Invalid argument provided.'
As instructed, no rows cannot be removed from this raw file and I am unsure how to solve this problem. Any suggestion is welcome.
The solution is to use the with clause and define the data type that you expect to receive when you perform the open row set ( see below ):
SELECT
TOP 100
POST_SEARCH_DATE,
SEARCH_DATE
FROM
OPENROWSET(
BULK 'https://test/refined-parquet/data/v1.0/loaddt=2021-01-01/**',
FORMAT = 'PARQUET'
WITH (
[SEARCH_DEPARTURE_DATE] VARCHAR(100) COLLATE Latin1_General_BIN2 3,
[POST_SEARCH_DEPARTURE_DATE] VARCHAR(100) COLLATE Latin1_General_BIN2 2
) AS [result]