I need to calculate the size of an ADLS folder but need to make sure blobs in the Archive layer are excluded from the list. If I use
$Blobs = Get-AzStorageBlob -Context $ctx -Container $containerName -Prefix $folderName
Its giving the size but there is no way I can filter out the access tier.
But If I use BlobServiceClient, the code is not scalable, it runs forever if I have millions of files.
blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_account_key)
container_client = blob_service_client.get_container_client(container_name)
blob_list = container_client.list_blobs(name_starts_with=folder_path)
for blob in blob_list:
blob_client = container_client.get_blob_client(blob.name)
blob_properties = blob_client.get_blob_properties()
if blob_properties.blob_tier != "Archive":
total_size += blob_properties.size
Is there an easy and scalable way to achieve this?
Thanks
Currently your code is not optimized. You need not call blob_client.get_blob_properties()
method to get the properties of the blob for each blob. They should already be available when you list the blobs.
Please try the following code:
blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_account_key)
container_client = blob_service_client.get_container_client(container_name)
blob_list = container_client.list_blobs(name_starts_with=folder_path)
for blob in blob_list:
if blob.blob_tier != "Archive":
total_size += blob_properties.size
Also, looking at the documentation of Get-AzStorageBlob
, the output of the cmdlet would be a list of blobs which are of type AzureStorageBlob
and that has a property called AccessTier
. What you can do is loop through the blobs returned by this Cmdlet and filter by access tier to get the desired information.