Search code examples
azureazure-storageazure-blob-storagereplication

Copying storage data from one Azure account to another


I would like to copy a very large storage container from one Azure storage account into another (which also happens to be in another subscription).

I would like an opinion on the following options:

  1. Write a tool that would connect to both storage accounts and copy blobs one at a time using CloudBlob's DownloadToStream() and UploadFromStream(). This seems to be the worst option because it will incur costs when transferring the data and also be quite slow because data will have to come down to the machine running the tool and then get re-uploaded back to Azure.

  2. Write a worker role to do the same - this should theoretically be faster and not incur any cost. However, this is more work.

  3. Upload the tool to a running instance bypassing the worker role deployment and pray the tool finishes before the instance gets recycled/reset.

  4. Use an existing tool - have not found anything interesting.

Any suggestions on the approach?

Update: I just found out that this functionality has finally been introduced (REST APIs only for now) for all storage accounts created on July 7th, 2012 or later:

http://msdn.microsoft.com/en-us/library/windowsazure/dd894037.aspx


Solution

  • Since there's no direct way to migrate data from one storage account to another, you'd need to do something like what you were thinking. If this is within the same data center, option #2 is the best bet, and will be the fastest (especially if you use an XL instance, giving you more network bandwidth).

    As far as complexity, it's no more difficult to create this code in a worker role than it would be with a local application. Just run this code from your worker role's Run() method.

    To make things more robust, you could list the blobs in your containers, then place specific file-move request messages into an Azure queue (and optimize by putting more than one object name per message). Then use a worker role thread to read from the queue and process objects. Even if your role is recycled, at worst you'd reprocess one message. For performance increase, you could then scale to multiple worker role instances. Once the transfer is complete, you simply tear down the deployment.

    UPDATE - On June 12, 2012, the Windows Azure Storage API was updated, and now allows cross-account blob copy. See this blog post for all the details.