my Azure webapp needs to download 1000+ very small files from a blob storage directory and process them.
If I list them, then download them one by one, it takes ages... Is there a fast way to do it? Like to download them all together?
PS: I use the following code:
from azure.storage.blob import ContainerClient, BlobClient
blob_list = #... list all files in a blob storage directory
for blob in blob_list:
blob_client = BlobClient.from_connection_string(connection_string, container_name, blob)
downloader = blob_client.download_blob(0)
blob = pickle.loads(downloader.readall())
I would also point out that since you are using azure-batch
you could use the blob mount configuration in your linux VMs. So the idea will be to mount the drive to your VM, hence take out all the download time, and your drive is attached to the vm.
Docs:https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount
Py SDK reference: https://learn.microsoft.com/en-us/python/api/azure-batch/azure.batch.models.mountconfiguration?view=azure-python
Blobfilesystem configuration: https://learn.microsoft.com/en-us/python/api/azure-batch/azure.batch.models.azureblobfilesystemconfiguration?view=azure-python
Key thing (Just for knowledge): Under the hood blobfilesystem
uses blobfuse
driver to mount. https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#azure-blob-file-system
Thanks and hope this help.