Search code examples
sqlamazon-athenaprestotrino

Athena query returns "INVALID_TABLE_PROPERTY" for daily partition projection preview table


I have created eight tables in AWS Athena using partition projection. Though the code is identical, seven are working correctly, but one table keeps throwing an error whenever I try to preview it. I have compared and see no difference from one table to the next, so it seems like everything should work. Does anyone have suggestions for troubleshooting this tidegauge table?

I should note that this is the one table that was renamed during a recent upgrade to our data pipelines.

Code to create the table:

CREATE EXTERNAL TABLE database1.tidegauge (
  `time` timestamp,
  `water_level` double,
  `qc_water_level` int
)
PARTITIONED BY(`date` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://bucket/tide_gauge/parquet/'
TBLPROPERTIES (
  'has_encrypted_data'='false',
  'projection.enabled' = 'true',
  'projection.date.format' = 'yyyy/MM/dd', 
  'projection.date.interval' = '1', 
  'projection.date.interval.unit' = 'DAYS',
  'projection.date.range' = '2021/03/01, NOW+1DAYS',
  'projection.date.type' = 'date', 
  'storage.location.template'='s3://bucket/tide_gauge/parquet/${day}/'
);

The simple preview query:

SELECT * FROM "database1"."tidegauge" limit 10;

The query fails and throws the following error:

INVALID_TABLE_PROPERTY: The 'storage.location.template' table property must contain templates for all partition columns! Table: database1.tidegauge Template: s3://bucket/tide_gauge/parquet/${day}/ (table database1.tidegauge)

And for good measure, here is code to create another table that is working:

CREATE EXTERNAL TABLE database1.met (
  `time` timestamp,
  `airtemp_avg` double,
  `qc_airtemp_avg` int,
  `baro_pressure` double,
  `qc_baro_pressure` int,
  `par_avg` double,
  `qc_par_avg` int,
  `rain` double,
  `qc_rain` int,
  `rh` double,
  `qc_rh` int,
  `winddir` double,
  `qc_winddir` int,
  `windspeed_avg` double,
  `qc_windspeed_avg` int,
  `windspeed_max` double,
  `qc_windspeed_max` int,
  `lower_par` double,
  `qc_lower_par` int,
  `upper_par` double,
  `qc_upper_par` int
)
PARTITIONED BY(`date` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://bucket/met/parquet/'
TBLPROPERTIES (
  'has_encrypted_data'='false',
  'projection.enabled' = 'true',
  'projection.date.format' = 'yyyy/MM/dd', 
  'projection.date.interval' = '1', 
  'projection.date.interval.unit' = 'DAYS',
  'projection.date.range' = '2021/03/01, NOW+1DAYS',
  'projection.date.type' = 'date', 
  'storage.location.template'='s3://bucket/met/parquet/${date}/'
);

Solution

  • It seems that you have mismatch in the partition column name - PARTITIONED BY(`date` string) but in the storage.location.template you have day - 's3://bucket/tide_gauge/parquet/${day}/', synchronize them, for example:

    'storage.location.template'='s3://bucket/tide_gauge/parquet/${date}/'
    

    Or the other way around.