I've got an Athena table pointed to a list of JSON objects akin to:
CREATE EXTERNAL TABLE `example_table` (
`foo` struct<
`bar`: string>
)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'ignore.malformed.json' = 'true'
)
LOCATION
's3://bucket/json';
This bucket has a lifecycle policy so that files of a certain age go to Glacier. I would like to query the other files, but see the following error:
Your query has the following error(s):
HIVE_CURSOR_ERROR: com.amazonaws.services.s3.model.AmazonS3Exception:
The operation is not valid for the object's storage class (
Service: Amazon S3;
Status Code: 403;
Error Code: InvalidObjectState;
AWS claims to ignore Glacier objects per this update (2/2019): https://docs.aws.amazon.com/athena/latest/ug/release-note-2019-02-18.html
I don't get access denied, but rather InvalidObjectState. How can I create a table to query these JSON files?
Are your objects Glacier or Glacier Deep Archive. Seems the storage class of DEEP_ARCHIVE
is not ignored.
Are you or is there anyway you can structure your key to use a date time format i.e. 2020/05/14/blob.json
. If you can do this then you can partition your data which would stop you using data that has been modified by the lifecycle rule.
Additional URLs