Fog gem leaking memory

I have written a script to recursively download through my Rackspace cloudfiles containers and retrieve a copy of every file, so I have a local backup in case Rackspace is hit by meteors and/or hindenbugs.

However, my script is apparently leaking memory at a linear scale when downloading my files.

Basically I have a method that looks like this:

def download_file(fog_file, destination_path)
  data = fog_file.body
  File.open(destination_path, 'w') { |f| f.write(data) }
end

I understand that due to the nature of Fog, I cannot avoid loading an entire file into memory, but I would imagine that Ruby would release memory (or have the ability to release memory) after each download_file invocation. After all, the data variable goes out of scope.

Unfortunately, when I look at my system monitoring, the memory usage just keep increasing at a linear pace until it consumes all of my available memory at which point the script crashes.

What am I doing wrong here?

I am using Ruby 2.1.2 on Ubuntu.

Solution

You can avoid to load the entire file in memory in two ways.

First, you can retrieve the file in 100kb(or less) chucks:

service = Fog::Storage.new({ provider: 'Rackspace', 
                             # ... auth config
                             connection_options: {chunk_size: 102_400} # 100 KB in bytes                        
})

directory = service.directories.get "dir"

File.open((destination_path, 'w') do | f |
  directory.files.get("my_file_on_cloud.png") do | data, remaining, content_length |
    f.syswrite data
  end
end

Second, you can retrieve file url with fog then use OpenUri to download and save the file:

require 'open-uri'

file = open(file.public_url).read
File.open(destination_path, 'w') { |f| f.write(file) }

The first method writes directly on destination file, the second instead create a Tempfile instance(create a temporary file on filesystem). Try both.