I have a data warehouse maintained in AWS Redshift. The data volume and velocity both have increased lately. One option is to keep scaling the cluster horizontally at the expanse of a higher cost of course. I was wondering if there are any archiving options available so that I can query the entire data as usual (maybe with a compromise in the querying time) but with a low or no additional cost?
One option would be to use external tables and query data directly from S3 but the tools used for achieving this, like Athena and Glue have their own cost, that too on a per query basis.
Easy options:
SELECT * FROM svv_table_info;
.ENCODE zstd
.DISTSTYLE ALL
(yes this saves space!).dc2
) to HDD nodes (ds2
) which have more 8x storage space.Less easy options:
UNLOAD
older data from Redshift to S3 and query using Redshift Spectrum.Please experiment with Redshift Spectrum. Query performance is typically very good and gets even better if your data is in a columnar format (Parquet/ORC).