Given an S3 bucket partitioned by date in this way:
year
|___month
|___day
|___file_*.parquet
I am trying to create a table in amazon redshift Spectrum with this command:
create external table spectrum.visits(
ip varchar(100),
user_agent varchar(2000),
url varchar(10000),
referer varchar(10000),
session_id char(32),
store_id int,
category_id int,
page_id int,
product_id int,
customer_id int,
hour int
)
partitioned by (year char(4), month varchar(2), day varchar(2))
stored as parquet
location 's3://visits/visits-parquet/';
Although an error message is not thrown, the results of the queries are always null, i.e., do not return results. The bucket is not null. Does someone knows want am I doing wrong?
When an External Table is created in Amazon Redshift Spectrum, it does not scan for existing partitions. Therefore, Redshift is not aware that they exist.
You will need to execute an ALTER TABLE ... ADD PARTITION
command for each existing partition.
(Amazon Athena has a MSCK REPAIR TABLE
option, but Redshift Spectrum does not.)