Search code examples
aws-glue

AWS Glue redshift_tmp_dir growing in size


As I understand things, when pushing data to Redshift, Glue writes the data to a 'temp' S3 location, and then utilizes Redshift's COPY from there.

I recently scanned our S3 buckets, and noticed that the path one of our jobs uses for redshift_tmp_dir, is growing in size, and not insignificantly !

So is it up the the developer to clear that location out at then end of a job ? I guess I assumed that the Glue processes took care of everything (naive I guess!)


Solution

  • Easiest would be to set up lifetime rules in S3 to clear out old files automatically.

    Find the s3 bucket, hit "management" and you can add a rule to delete the file after X days.