I have a powershell azure runbook that iterates through a large storage account and enforces file age policies on the blobs within the account. This runs fine but runs up against the Fair Share policy of 3 hours. I can use hybrid workers but I would prefer to run multiple child runbooks in parallel each handling a different portion of the blob account using the first letter prefix.
Example:
First child runbook runs A-M Second: N-Z Third: a-m Fourth: m-z
I'm thinking of using a prefix variable within a loop that will iterate between letters.
## Declaring the variables
$number_of_days_bak_threshold = 15
$number_of_days_trn_threshold = 2
$current_date = get-date
$date_before_blobs_to_be_deleted_bak = $current_date.AddDays(-$number_of_days_bak_threshold)
$date_before_blobs_to_be_deleted_trn = $current_date.AddDays(-$number_of_days_trn_threshold)
# Number of blobs deleted
$blob_count_deleted = 0
# Storage account details
$storage_account_name = <Account Name>
$storage_account_key = <Account Key>
$container = <Container>
## Creating Storage context for Source, destination and log storage accounts
$context = New-AzureStorageContext -StorageAccountName $storage_account_name -StorageAccountKey $storage_account_key
$blob_list = Get-AzureStorageBlob -Context $context -Container $container
## Creating log file
$log_file = "log-"+(get-date).ToString().Replace('/','-').Replace(' ','-').Replace(':','-') + ".txt"
$local_log_file_path = $env:temp + "\" + "log-"+(get-date).ToString().Replace('/','-').Replace(' ','-').Replace(':','-') + ".txt"
write-host "Log file saved as: " $local_log_file_path -ForegroundColor Green
## Iterate through each blob
foreach($blob_iterator in $blob_list){
$blob_date = [datetime]$blob_iterator.LastModified.UtcDateTime
# Check if the blob's last modified date is less than the threshold date for deletion for trn files:
if($blob_iterator.Name -Match ".trn") {
if($blob_date -le $date_before_blobs_to_be_deleted_trn) {
Write-Output "-----------------------------------" | Out-File $local_log_file_path -Append
write-output "Purging blob from Storage: " $blob_iterator.name | Out-File $local_log_file_path -Append
write-output " " | Out-File $local_log_file_path -Append
write-output "Last Modified Date of the Blob: " $blob_date | Out-File $local_log_file_path -Append
Write-Output "-----------------------------------" | Out-File $local_log_file_path -Append
# Cmdle to delete the blob
Remove-AzureStorageBlob -Container $container -Blob $blob_iterator.Name -Context $context
$blob_count_deleted += 1
Write-Output "Deleted "$extn
}
}
Elseif($blob_iterator.Name -Match ".bak") {
if($blob_date -le $date_before_blobs_to_be_deleted_bak) {
Write-Output "-----------------------------------" | Out-File $local_log_file_path -Append
write-output "Purging blob from Storage: " $blob_iterator.name | Out-File $local_log_file_path -Append
write-output " " | Out-File $local_log_file_path -Append
write-output "Last Modified Date of the Blob: " $blob_date | Out-File $local_log_file_path -Append
Write-Output "-----------------------------------" | Out-File $local_log_file_path -Append
# Cmdle to delete the blob
Remove-AzureStorageBlob -Container $container -Blob $blob_iterator.Name -Context $context
$blob_count_deleted += 1
Write-Output "Deleted "$extn
}
}
Else{
Write-Error "Unable to determine file type." $blob_iterator.Name
}
}
Write-Output "Blobs deleted: " $blob_count_deleted | Out-File $local_log_file_path -Append
I expect to be able to run through the account in parallel.
So, I agree with @4c74356b41 that breaking the workload down is the best approach. However, that is itself, not always as simple as it might sound. Below I describe the various workarounds for fairshare and the potential issues I can think of off the top of my head. It as quite a lot of information, so here are the highlights:
TL;DR
No matter what, there are ways to breakup a sequential workload into sequentially executed jobs, e.g. each job works on a segment and then starts the next job as it's last operation. (Like a kind of recursion.) However, managing a sequential approach to correctly handle intermittent failures can add a lot of complexity.
If the workload can be broken down into smaller jobs that do not use a lot of resources, then you could do the work in parallel. In other words, if the amount of memory and socket resources required by each segment is low, and as long as there is no overlap or contention, this approach should run in parallel much faster. I also suspect that in parallel, the combined job minutes will still be less than the minutes necessary for a sequential approach.
There is one gotcha with processing the segments in parallel...
When a bunch of AA jobs belonging to the same account are started together, the tendency that they will all run within the same sandbox instance increases significantly. Sandboxes are never shared with un-related accounts, but because of improvements in job start performance, there is a preference to share sandboxes for jobs within the same account. When these jobs all run at the same time, there is an increased likelihood that the overall sandbox resource quota will be hit and then the sandbox will perform a hard exit immediately.
Because of this gotcha, if your workload is memory or socket intensive, you may want to have a parent runbook that controls the lifecycle (i.e. start rate) of the child runbooks. This has the twisted effect that the parent runbook could now hit the fairshare limit.
The next workaround is to implement runbooks that kick off the job for the next processing segment when they are completed. The best approach for this is to store the next segment somewhere the job can retrieve it, e.g. variable or blob. This way, if a job fails with its segment, as long as there is some way of making sure the jobs run until the entire workload finishes, everything will eventually finish. You might want to use a watcher task to verify eventual completion and handle retries. Once you get to this level of complexity, you can experiment to discover how much parallelism you can introduce without hitting resource limits.
If you have no worry for hitting resource limits, you could consider using the ThreadJob module on the PowerShell Gallery. With this approach, you would still have a single runbook, but know you would be able to parallelize the workload within that runbook and complete the workload before hitting the fairshare limit. This can be very effective if the individual tasks are fast and lightweight. Otherwise, this approach may work for a little while but begin to fail if the workload increases in either the time or resources required.
Do not use PowerShell Jobs within an AA Job to achieve parallelism. This includes not using commands like Parallel-ForEach
. There are a lot of examples for VM-Start/Stop
runbooks that use PowerShell Jobs; this is not a recommend approach. PowerShell Jobs require a lot of resources to execute, so using PowerShell Jobs will significantly increase the resources used by you AA Job and the chance of hitting the memory quota.
You can get around the fairshare limitation by re-implementing you code as a Power Shell Workflow and performing frequent checkpoints. When a workflow job hits the fairshare limit, if it has been performing checkpoints, it will be restarted on another sandbox, resuming from the last checkpoint.
My recollection is your jobs need to perform a checkpoint at least once every 30 minutes. If they do this, that will resume from the fairshare without any penalty, forever. (At the cost of a tremendous number of job minutes.)
Even without a checkpoint, a workflow will get re-tried 2 times after hitting the checkpoint. Because of this, if your workflow code is idempotent, and quickly will skip previously completed work, by using a workflow, your job may complete (in 9 hours) even without checkpoints.
However, workflow are not just Power Shell script wrapped inside a workflow {}
script block:
Migrate the work into Azure Functions. With the recent release of PowerShell function, this might be relatively easy. Functions will have different limitations than those in Automation. This difference might suit your workload well, or be worse. I have not yet tried the Functions approach, so I can't really say.
The most obvious difference you will notice right away is that Functions is a lot more of a raw, DevOps oriented service than Automation. This is partly because Automation is a more mature product. (Automation was probably the first widely available serverless service, having launched roughly a year before Lambda.) Automation was purpose built for automating the management of cloud resources, and automation is the driving factor in the feature selection. Whereas Functions is a much more general purpose serverless operations approach. Anyway, one obvious difference at the moment is Functions does not have any built-in support for things like the RunAs account or Variables. I expect Functions will improve in this specific aspect over time, but right now it is pretty basic for automation tasks.