We have a use case where we are processing the data in Redshift. But I want to create backup of these tables in S3, so that I can query these using Spectrum.
For moving the tables from Redshift to S3 I am using a Glue ETL. I have created a crawler for AWS Redshift. A Glue job converts the data to parquet and stores it in S3, partitioned by date. Then, another crawler crawls the S3 files to catalog the data again.
How can I eliminate the second crawler and do this in the job itself?
There is no need to use AWS Glue or Athena to unload Redshift data to S3 in Parquet format. The feature to unload data in Parquet format is now supported in Redshift:
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
FORMAT PARQUET
Documentation can be found at UNLOAD - Amazon Redshift