Search code examples
azureazure-blob-storageazure-batch

Azure/Python - download files quickly from storage


my Azure webapp needs to download 1000+ very small files from a blob storage directory and process them.

If I list them, then download them one by one, it takes ages... Is there a fast way to do it? Like to download them all together?

PS: I use the following code:

from azure.storage.blob import ContainerClient, BlobClient

blob_list = #... list all files in a blob storage directory

for blob in blob_list:
    blob_client = BlobClient.from_connection_string(connection_string, container_name, blob)
    downloader = blob_client.download_blob(0)
    blob = pickle.loads(downloader.readall())

Solution

  • I would also point out that since you are using azure-batch you could use the blob mount configuration in your linux VMs. So the idea will be to mount the drive to your VM, hence take out all the download time, and your drive is attached to the vm.

    Thanks and hope this help.