Search code examples
rubyazurechef-infraazure-storageazure-blob-storage

NoMemoryError when downloading Azure Blob in Ruby


Environment:

  • Windows 10 x64
  • Ruby 2.1.0 32 bit
  • Chef 12.12.15
  • Azure Gem 0.7.9
  • Azure-Storage Gem 0.12.1.preview

I am trying to download a ~880MB blob from a container. When I do, it throws the following error after the Ruby process hits ~500MB in size:

C:/opscode/chefdk/embedded/lib/ruby/2.1.0/net/protocol.rb:102:in `read': failed to allocate memory (NoMemoryError)

I have tried this both inside and outside of Ruby, and with both the Azure gem and the Azure-Storage gem. The result is the same with all four combinations (Azure in Chef, Azure in Ruby, Azure-Storage in Chef, Azure-Storage in Ruby).

Most of the troubleshooting I have found for these kinds of problems suggests streaming or chunking the download, but there does not appear to be a corresponding method or get_blob option to do so.

Code:

require 'azure/storage'

# vars
account_name = "myacct"
container_name = "myfiles"
access_key = "mykey"
installs_dir = "myinstalls"

# directory for files
create_dir = 'c:/' + installs_dir
Dir.mkdir(create_dir) unless File.exists?(create_dir)

# create azure client
Azure::Storage.setup(:storage_account_name => account_name, :storage_access_key => access_key)
azBlobs = Azure::Storage::Blob::BlobService.new

# get list of blobs in container
dlBlobs = azBlobs.list_blobs(container_name)

# download each blob to directory
dlBlobs.each do |dlBlob|
    puts "Downloading " + container_name + "/" + dlBlob.name
    portalBlob, blobContent = azBlobs.get_blob(container_name, dlBlob.name)
    File.open("c:/" + installs_dir + "/" + portalBlob.name, "wb") {|f|

        f.write(blobContent)
    }
end

I also tried using IO.binwrite() instead of File.open() and got the same result.

Suggestions?


Solution

  • As @coderanger said, your issue was caused by using get_blob to local data into memory at once. There are two ways for resolving it.

    1. According to the offical REST reference here as below.

    The maximum size for a block blob created via Put Blob is 256 MB for version 2016-05-31 and later, and 64 MB for older versions. If your blob is larger than 256 MB for version 2016-05-31 and later, or 64 MB for older versions, you must upload it as a set of blocks. For more information, see the Put Block and Put Block Listoperations. It's not necessary to also call Put Blob if you upload the blob as a set of blocks.

    So for a blob which consist of block blobs, you can try to get the block blob list via list_blob_blocks to write these block blobs one by one to a local file.

    1. To generate a blob url with SAS token via signed_uri like this test code, then to download the blob via streaming to write a local file.