I have taken pg dump of large tables (>50 GB) in AWS S3 in csv.gz format from postgresql RDS. I want to store this data into AWS athena in the form of actual tables as in the RDS, so that when requirement is there i can directly use the data without restoring those tables in the RDS.
Any suggestions will be much appreciated.
If each file contains data from one table in CSV format, you can create a table to query that data.
I find the easiest way is to use AWS Glue:
If you wish to improve performance of the queries, I would recommend that you convert the data into Snappy-compressed Parquet format.
From Examples of CTAS queries - Amazon Athena:
CREATE TABLE new_table
WITH (
format = 'Parquet',
write_compression = 'SNAPPY')
AS SELECT *
FROM old_table;