t-sql parquet azure-synapse azure-data-lake-gen2

create external table with datetime/date-column from parquet with Synapse

I try to create an external table from an external file stored in my Azure datalake.
But I struggle with datetime and date columns in parquet-files!

Here is the code with which I try to create the external-Table
PS: I also tried using date and datetime but got errors.. though varchar should yield at least something, right?

create external table [parquet_userdata]
(
    registration_time [varchar] (100) null,
    id [varchar] (10) null,
    first_name [varchar] (100) null,
    last_name [varchar] (100) null,
    email [varchar] (100) null,
    gender [varchar] (100) null,
    ip_address [varchar] (100) null,
    cc [varchar] (100) null,
    country [varchar] (100) null,
    birthday [varchar] (100) null,
    salary [float] null,
    title [varchar] (100) null,
    comments [varchar] (2000) null
)
with(
    -- specify the file in container
    location= 'userdata.parquet',
    data_source = parquet_location,
    file_format = parquetformat
)

This is what I get:

Here is the data that I used (I took userdata1.parquet):
https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet

In case of doubt about the usability of the data, here we can quickly validate the content of the data and what it should look like:
https://www.parquet-viewer.com/#parquet-online

Solution

Here, the external table your creating using parquet files is sensitive to column names. You need to give exact names.

Below is the result got for this command.

create  external  table [parquet_userdata_c]
(
    c1 [varchar] (1000) null,
    c2 [varchar] (10) null,
    c3 [varchar] (100) null,
    c4 [varchar] (100) null,
    c5 [varchar] (100) null,
    c6 [varchar] (100) null,
    c7 [varchar] (100) null,
    c8 [varchar] (100) null,
    c9 [varchar] (100) null,
    c10 [varchar] (1000) null,
    c11 [float] null,
    c12 [varchar] (100) null,
    c13 [varchar] (2000) null
)
with(
location='/data/Databricks/userdata1.parquet',
data_source = parquet_location,
file_format = user_file_format
);

Output:

enter image description here

So, after adding exact names.

create  external  table [userdata]
(
    registration_dttm DATETIME2(7) null,
    id [varchar] (10) null,
    first_name [varchar] (100) null,
    last_name [varchar] (100) null,
    email [varchar] (100) null,
    gender [varchar] (100) null,
    ip_address [varchar] (100) null,
    cc [varchar] (100) null,
    country [varchar] (100) null,
    birthdate [varchar] (1000) null,
    salary [float] null,
    title [varchar] (100) null,
    comments [varchar] (2000) null
)
with(
    location='/data/Databricks/userdata1.parquet',
    data_source = parquet_location,
    file_format = user_file_format
    );

Output:

enter image description here

If you don't now the column name, you check with the OPENROWSET as below.

SELECT
    TOP  5 *
FROM
OPENROWSET(
    BULK  'https://jgsblob.blob.core.windows.net/data/Databricks/userdata1.parquet',
    FORMAT = 'PARQUET'
) AS [result]

enter image description here