I have created eight tables in AWS Athena using partition projection. Though the code is identical, seven are working correctly, but one table keeps throwing an error whenever I try to preview it. I have compared and see no difference from one table to the next, so it seems like everything should work. Does anyone have suggestions for troubleshooting this tidegauge table?
I should note that this is the one table that was renamed during a recent upgrade to our data pipelines.
Code to create the table:
CREATE EXTERNAL TABLE database1.tidegauge (
`time` timestamp,
`water_level` double,
`qc_water_level` int
)
PARTITIONED BY(`date` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://bucket/tide_gauge/parquet/'
TBLPROPERTIES (
'has_encrypted_data'='false',
'projection.enabled' = 'true',
'projection.date.format' = 'yyyy/MM/dd',
'projection.date.interval' = '1',
'projection.date.interval.unit' = 'DAYS',
'projection.date.range' = '2021/03/01, NOW+1DAYS',
'projection.date.type' = 'date',
'storage.location.template'='s3://bucket/tide_gauge/parquet/${day}/'
);
The simple preview query:
SELECT * FROM "database1"."tidegauge" limit 10;
The query fails and throws the following error:
INVALID_TABLE_PROPERTY: The 'storage.location.template' table property must contain templates for all partition columns! Table: database1.tidegauge Template: s3://bucket/tide_gauge/parquet/${day}/ (table database1.tidegauge)
And for good measure, here is code to create another table that is working:
CREATE EXTERNAL TABLE database1.met (
`time` timestamp,
`airtemp_avg` double,
`qc_airtemp_avg` int,
`baro_pressure` double,
`qc_baro_pressure` int,
`par_avg` double,
`qc_par_avg` int,
`rain` double,
`qc_rain` int,
`rh` double,
`qc_rh` int,
`winddir` double,
`qc_winddir` int,
`windspeed_avg` double,
`qc_windspeed_avg` int,
`windspeed_max` double,
`qc_windspeed_max` int,
`lower_par` double,
`qc_lower_par` int,
`upper_par` double,
`qc_upper_par` int
)
PARTITIONED BY(`date` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://bucket/met/parquet/'
TBLPROPERTIES (
'has_encrypted_data'='false',
'projection.enabled' = 'true',
'projection.date.format' = 'yyyy/MM/dd',
'projection.date.interval' = '1',
'projection.date.interval.unit' = 'DAYS',
'projection.date.range' = '2021/03/01, NOW+1DAYS',
'projection.date.type' = 'date',
'storage.location.template'='s3://bucket/met/parquet/${date}/'
);
It seems that you have mismatch in the partition column name - PARTITIONED BY(`date` string)
but in the storage.location.template
you have day
- 's3://bucket/tide_gauge/parquet/${day}/'
, synchronize them, for example:
'storage.location.template'='s3://bucket/tide_gauge/parquet/${date}/'
Or the other way around.