Search code examples
ruby-on-railsherokuamazon-s3

What is the best way to create a large ZIP file with Rails data and send it to Amazon S3 storage?


I have a feature in a Ruby on Rails application that backs up all of a user's account data so they can download it in a ZIP file and store it locally.

To create the ZIP file, I'm doing the following:

  1. Using ZIP::OutputStream to open a ZIP file stream.

  2. Going through each of the relevant models in the users account, converting all the records in that model to a CSV, then adding each CSV to the ZIP file.

  3. Sending the resulting ZIP file to AWS S3.

Here is some pseudo code to illustrate the process:

output_stream = Zip::OutputStream.write_buffer do |zos| 

    @models_to_backup.each do |model|
       csv = model.convert_to_csv_file
       zos.put_next_entry("csv_files/#{model.name}.csv")
       zos.write csv
    end

end

output_stream.rewind
SendFileToS3(output_stream)

This works fine for smaller files, but most users have upwards of 100,000 records. Thus, as the ZIP::OutputStream is generating, I quickly run into memory issues (I'm hosting the app on Heroku) because the output stream is all being stored in memory until it's sent.

Is there a more memory efficient way to create these ZIP files? Is there a way to stream the ZIP to S3 in batches as it's created, to avoid creating the entire ZIP file in memory? Or, will I just need to provision a higher memory-limit server to accomplish this?


Solution

  • Answering my own question in case anyone sees this in the future. After looking into this for several days, I couldn't find a great solution.

    My somewhat hacky workaround is to just temporarily use a Heroku Performance-L dyno (which has 14GB of memory) when I'm running the backups (only has to happen once a month.) Probably not the most elegant solution, but it works.