Search code examples
azureazure-blob-storageazure-data-factoryhigh-availabilityazcopy

How to sync two Azure blobs on two different accounts on schedule time?


In order to achieve High Availability we have created two blob containers under different storage accounts at different Azure regions.

So that, if application find issues while WRITING to primary blob container, application will perform circuit-breaker logic and if issue persists even after threshold number of attempts, application will start WRITING to stand-by blob storage account which is located in different Azure location & this architecture works fine.

Code used to switch from primary to secondary:

AsyncLazy<CloudQueue> qClient = new AsyncLazy<CloudQueue>(async () =>
{
    var myStorageAccount = CloudStorageAccount.Parse("ConnectionString");
    var myQueue = myStorageAccount.CreateCloudQueueClient();
    myQueue.DefaultRequestOptions = new QueueRequestOptions
    {
        RetryPolicy = new ExponentialRetry(TimeSpan.FromSeconds(3), 4),
        
        LocationMode = LocationMode.PrimaryThenSecondary,
        
        MaximumExecutionTime = TimeSpan.FromSeconds(20)
    };
    var queue = myQueue.GetQueueReference("QueueName");
    await queue.CreateIfNotExistsAsync();
    return queue;
});

Now the problem is, how to sync back the data written to standby location with actual blob in primary location once primary account is back in action?

Azcopy is one option to sync two blobs on two different account but issue is as per this link official docker image of Azcopy not available yet

Is official docker image of Azcopy available if not what are the other suitable options to o sync two blobs on two different account on schedule time? Azure Data Factory? Azure function?


Solution

  • First, as you may know, there is no official docker image of Azcopy available. The github issue mentions it.

    And yes, you can use azure function to do it(about ADF, not sure, but asked some guys, they say it's not easy to do that), but it may a little difficult.

    The easier solution is to use azure web job and azcopy together. Just specify the webjob as schedule when creating it in azure portal. Azure webjob supports many file types like .ps1(powershell), .cmd, .py etc. So it's very easy to use one of your favorites to create it.

    Here, I will create a .ps1(powershell) file, then upload it to azure webjob to execute the sync job on scheduled time.

    Step 1: create a .ps1 file. The name of the file must be run.ps1. Then use the code below in the run.ps1 file(please use your own source storage and destination storage in the code):

    #define the source container and destination container, note that the sas token are required.
    $source_container = "https://yy1.blob.core.windows.net/a11?sv=2020-02-10&ss=bfqt&srt=sco&sp=rwdlacup&se=2021-03-12T11:00:50Z&st=2021-03-12T03:00:50Z&spr=https&sig=xxxxxx"
    $dest_container = "https://yyasia1.blob.core.windows.net/test123?sv=2020-02-10&ss=bfqt&srt=sco&sp=rwdlacup&se=2021-03-12T11:01:36Z&st=2021-03-12T03:01:36Z&spr=https&sig=xxxxxx"
    
    #get the current working directory which contains the current .ps1 file and the azcopy.exe
    $path = Split-Path -Parent $MyInvocation.MyCommand.Definition
    
    #set the location to the path where contains the azcopy.exe
    Set-Location -Path $path
    
    #execute the sync command
    .\azcopy.exe sync $source_container $dest_container --recursive
    
    Write-Output "**completed**"
    

    Step 2: Put the azcopy.exe(if you don't have it, please download it first) in the same location of the run.ps1 file. Then zip the 2 files into a .zip file(Note: before zip them, you'd better test your code locally to see if it can work):

    enter image description here

    Step 3: Assume you already have an azure web app service which supports webjob and the always on feature is enabled. Nav to azure portal -> your azure web app -> Settings -> Webjobs -> click "Add" button -> in the "Add Webjob" panel, fill in all the necessary field / choose the .zip file / and set the correct Type / Triggers / CRON. Here is the screenshot:

    enter image description here

    Step 4: After the webjob is created(it may take a few minutes, click the "refresh" button to check it), select the webjob, and click "run" button:

    enter image description here