Search code examples
azureazure-blob-storageazure-python-sdk

Why do we need to open a file to upload it to Azure Blob storage?


I'm new to Azure and am trying to upload files (in tens of thousands) to Azure blob storage using their Python SDK. All examples that I came across on the web open a file before uploading it:

https://learn.microsoft.com/en-us/python/api/overview/azure/storage-blob-readme?view=azure-python#uploading-a-blob

https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-upload-python#upload-a-block-blob-from-a-local-file-path

Why is this necessary? I am concerned that if this will slow down the upload.

Boto3 for AWS S3 doesn't do this. Can you please explain the reason behind this?


Solution

  • Why is this necessary? I am concerned that if this slows down the upload.

    To upload data, the Azure Blob Storage client libraries need a file-like object.

    Large files can be uploaded quickly because the file is read in chunks. Opening a file before uploading it does not necessarily slow down the upload process. In fact, it can be more efficient to read data from a file-like object than to read it from a file on disk.

    Code:

    with open("./SampleSource.txt", "rb") as data: 
      blob.upload_blob(data)
    

    The above code opens the file, creates a file-like object, and uploads the contents of the file-like object to the blob storage.

    The same process applies when using Boto3 for uploading to AWS S3 - you can use a file object for the upload by upload_fileobj.

    Code:

    with open('filename', 'rb') as data:
        s3.upload_fileobj(data, 'mybucket', 'mykey')
    

    Reference:

    Uploading files - Boto3 1.34.64 documentation (amazonaws.com)