amazon-web-services aws-glue amazon-athena aws-glue-data-catalog

Athena Best Practice to store query result

I am creating a Data Lake and have some tables in Glue Catalog that I need to query in Athena. As a prerequisite, Athena requires us to store the query results in a S3 bucket. I have "Temp" and "Logs" S3 buckets. But since this is client sensitive data, I just want to check should I create a new Athena bucket for this or use the existing temp/logs bucket.

Note: I dont have any future use of the Athena queries.

Solution

That's a good point you make -- the output of the Amazon Athena queries will appear in the output files, including sensitive data.

You could create a bucket that only permits Write access -- that is, put a Deny policy on it so that nobody can GetObject from the bucket. That way, Athena is happy to write its output, but people can't see the results.

You could also apply an Amazon S3 lifecycle policy that deletes the files after one day.

An alternate method would be to trigger an AWS Lambda function as soon as the object is created and have the Lambda function delete the object.

Either way, ask people to direct their Athena output to that bucket if they don't need to access the results, or if there is sensitive data being retrieved.