I have parquet files written by Pandas(pyarrow) with fields in Double type. Following is parquet schema:
message schema {
optional binary domain (STRING);
optional binary type;
optional binary country;
optional binary region;
optional binary country_code (STRING);
optional int64 date (TIMESTAMP(MILLIS,true));
optional double visits;
optional double average_visit_duration;
}
Problem: I used Redshift Spectrum to create external table to read data in those parquet. Table schema:
CREATE EXTERNAL TABLE spectrum.similarweb_daily_current(
domain varchar(200),
type varchar(200),
country varchar(200),
region varchar(200),
country_code varchar(200),
visits decimal(38,37),
average_visit_duration decimal(38,37))
STORED as PARQUET
LOCATION
's3://XXX'
When doing simple select query, it shows error that schema incompatible => Double vs Decimal. Do anyone have any idea how to solve it?
Study I did:
On Pandas/pyarrow, it seems I can't adjust the schema to decimal when writing into parquet.
On Redshift, Double type doesn't support external table(spectrum)
Solved this. Use Redshift "Double Precision" Type for double in parquet.