Search code examples
amazon-web-servicesazureamazon-s3azure-storage

Migrate a large file from AWS S3 to Azure BlobStorage. Also Need to monitor the progress while migration in Java


I've written a code that first downloads the file from AWS and then starts uploading to Azure. I also need to monitor full progress of the migration. But this consumes alot of bandwidth and time and no monitoring of data as well. What should be the best way to make a reliable transfer from s3 to blobstorage along with monitoring of migration

        //Downloading from AWS

        BasicAWSCredentials awsCreds = new BasicAWSCredentials(t.getDaccID(),t.getDaccKey());
        AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
        .withRegion(Regions.fromName("us-east-2"))
        .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
        .build();

        S3Object s3object = s3Client.getObject(new GetObjectRequest(t.getDbucket(), t.getDfileName()));
        byte[] bytes = IOUtils.toByteArray(s3object.getObjectContent());

        //Uploading to Azure
        String Connstr = "DefaultEndpointsProtocol=https;AccountName="+t.getUaccID()+";AccountKey="+t.getUaccKey()+";EndpointSuffix=core.windows.net";

        CloudStorageAccount cloudStorageAccount =CloudStorageAccount.parse(Connstr);
        CloudBlobClient blobClient = cloudStorageAccount.createCloudBlobClient();
        CloudBlobContainer container=blobClient.getContainerReference(t.getUbucket());
        CloudBlockBlob blob = container.getBlockBlobReference(t.getDfileName());
        blob.uploadFromByteArray(bytes ,0, bytes.length);
        writer.append("File Uploaded to Azure Successful \n");

Solution

  • You don't really need to download the file from S3 and upload it back in Azure Blob Storage. Azure Blob Storage supports creating new blob by copying objects from a publicly available URL. This is an asynchronous operation and is done on the server-side by Azure Storage itself.

    Here's what you would need to do (in lieu of the code):

    • Create a Signed URL of the object in AWS S3 or you can have the object publicly available.
    • Use Azure Storage Java SDK to create a blob by using Copy Blob functionality. In the copy operation, the source URL will be the signed URL.
    • Once the copy initiates, you would need to periodically fetch the properties of the blob. In the properties, you will see Copy Properties and there you will be told about the progress (both in terms of percentage as well as bytes copied). You can use that to monitor the progress of the copy.

    I wrote a blog post long time back (when async copy blob was first introduced) which talks about copying objects from Amazon S3 to Azure Blob Storage. You can read that blog post here: https://gauravmantri.com/2012/06/14/how-to-copy-an-object-from-amazon-s3-to-windows-azure-blob-storage-using-copy-blob/.