I have a feature in a Ruby on Rails application that backs up all of a user's account data so they can download it in a ZIP file and store it locally.
To create the ZIP file, I'm doing the following:
Using ZIP::OutputStream to open a ZIP file stream.
Going through each of the relevant models in the users account, converting all the records in that model to a CSV, then adding each CSV to the ZIP file.
Sending the resulting ZIP file to AWS S3.
Here is some pseudo code to illustrate the process:
output_stream = Zip::OutputStream.write_buffer do |zos|
@models_to_backup.each do |model|
csv = model.convert_to_csv_file
zos.put_next_entry("csv_files/#{model.name}.csv")
zos.write csv
end
end
output_stream.rewind
SendFileToS3(output_stream)
This works fine for smaller files, but most users have upwards of 100,000 records. Thus, as the ZIP::OutputStream is generating, I quickly run into memory issues (I'm hosting the app on Heroku) because the output stream is all being stored in memory until it's sent.
Is there a more memory efficient way to create these ZIP files? Is there a way to stream the ZIP to S3 in batches as it's created, to avoid creating the entire ZIP file in memory? Or, will I just need to provision a higher memory-limit server to accomplish this?
Answering my own question in case anyone sees this in the future. After looking into this for several days, I couldn't find a great solution.
My somewhat hacky workaround is to just temporarily use a Heroku Performance-L dyno (which has 14GB of memory) when I'm running the backups (only has to happen once a month.) Probably not the most elegant solution, but it works.