I have written a script to recursively download through my Rackspace cloudfiles containers and retrieve a copy of every file, so I have a local backup in case Rackspace is hit by meteors and/or hindenbugs.
However, my script is apparently leaking memory at a linear scale when downloading my files.
Basically I have a method that looks like this:
def download_file(fog_file, destination_path)
data = fog_file.body
File.open(destination_path, 'w') { |f| f.write(data) }
end
I understand that due to the nature of Fog, I cannot avoid loading an entire file into memory, but I would imagine that Ruby would release memory (or have the ability to release memory) after each download_file
invocation. After all, the data
variable goes out of scope.
Unfortunately, when I look at my system monitoring, the memory usage just keep increasing at a linear pace until it consumes all of my available memory at which point the script crashes.
What am I doing wrong here?
I am using Ruby 2.1.2 on Ubuntu.
You can avoid to load the entire file in memory in two ways.
First, you can retrieve the file in 100kb(or less) chucks:
service = Fog::Storage.new({ provider: 'Rackspace',
# ... auth config
connection_options: {chunk_size: 102_400} # 100 KB in bytes
})
directory = service.directories.get "dir"
File.open((destination_path, 'w') do | f |
directory.files.get("my_file_on_cloud.png") do | data, remaining, content_length |
f.syswrite data
end
end
Second, you can retrieve file url with fog then use OpenUri to download and save the file:
require 'open-uri'
file = open(file.public_url).read
File.open(destination_path, 'w') { |f| f.write(file) }
The first method writes directly on destination file, the second instead create a Tempfile
instance(create a temporary file on filesystem). Try both.