Search code examples
amazon-web-servicesamazon-athenaamazon-glacier

AWS Athena querying mixed JSON objects that have some moved to Glacier


I've got an Athena table pointed to a list of JSON objects akin to:

CREATE EXTERNAL TABLE `example_table` (
`foo` struct<
  `bar`: string>
)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'ignore.malformed.json' = 'true'
)
LOCATION 
  's3://bucket/json';

This bucket has a lifecycle policy so that files of a certain age go to Glacier. I would like to query the other files, but see the following error:

Your query has the following error(s):

HIVE_CURSOR_ERROR: com.amazonaws.services.s3.model.AmazonS3Exception: 
The operation is not valid for the object's storage class (
   Service: Amazon S3; 
   Status Code: 403; 
   Error Code: InvalidObjectState;

AWS claims to ignore Glacier objects per this update (2/2019): https://docs.aws.amazon.com/athena/latest/ug/release-note-2019-02-18.html

I don't get access denied, but rather InvalidObjectState. How can I create a table to query these JSON files?


Solution

  • Are your objects Glacier or Glacier Deep Archive. Seems the storage class of DEEP_ARCHIVE is not ignored.

    Are you or is there anyway you can structure your key to use a date time format i.e. 2020/05/14/blob.json. If you can do this then you can partition your data which would stop you using data that has been modified by the lifecycle rule.

    Additional URLs