First off, sorry for the long post - I'm trying to be detailed!
I'm looking to automate a work around for an issue I discovered. I have a worker that periodically bombs once the "working" directory has more than 100,000 files in it. Preventatively I can stop the process and rename the working directory to "HOLD" and create new working dir to keep it going. Then I move files from the HOLD folder(s) back into the working dir a little bit at a time until its caught up.
What I would like to do is automate the entire process via Task Scheduler with 2 PowerShell scripts.
----SCRIPT 1----
Here's the condition:
I find that( [System.IO.Directory]::EnumerateFiles($Working)
is faster than Get-ChildItem
.
The actions:
Stop-Service
for Service1, Service2, Service3Rename-Item -Path "C:\Prod\Working\" -NewName "Hold"
or "Hold1","2","3",etc.. if the folder already exists --I'm not particular about the numeration as long as it is consistent so if it's easier to let the system name it HOLD, HOLD(1), HOLD(2), etc.. or append the date after HOLD then that's fine.New-Item C:\Prod\Working -type directory
Start-Service
Service1, Service2, Service3---SCRIPT 2----
Condition:
Actions:
Before it comes up, I'm well aware it would be easier to simply move the files from the working folder to a Hold folder, but the size of the files can be very large and moving them always seems to take much longer.
I greatly appreciate any input and I'm eager to see some solid answers!
EDIT
Here's what I'm running for Script 2 -courtesy of Bacon
#Setup
$restoreThreshold = 30000; # Ensure there's enough room so that restoring $restoreBatchSize
$restoreBatchSize = 500; # files won't push $Working's file count above $restoreThreshold
$Working = "E:\UnprocessedTEST\"
$HoldBaseDirectory = "E:\"
while (@(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
$holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter '*Hold*' |
Select-Object -Last 1;
if ($holdDirectory -eq $null)
{
# There are no Hold directories to process; don't keep looping
break;
}
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
$restoredCount = Get-ChildItem $holdDirectory -File `
| Select-Object -First $restoreBatchSize | Move-Item -Destination $Working -PassThru |
Measure-Object | Select-Object -ExpandProperty 'Count';
# If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
if ($restoredCount -lt $restoreBatchSize)
{
Remove-Item -Path $holdDirectory;
}
}
The first script could look like this:
$rotateThreshold = 60000;
$isThresholdExceeded = @(
Get-ChildItem -File -Path $Working `
| Select-Object -First ($rotateThreshold + 1) `
).Length -gt $rotateThreshold;
#Alternative: $isThresholdExceeded = @(Get-ChildItem -File -Path $Working).Length -gt $rotateThreshold;
if ($isThresholdExceeded)
{
Stop-Service -Name 'Service1', 'Service2', 'Service3';
try
{
$newName = 'Hold_{0:yyyy-MM-ddTHH-mm-ss}' -f (Get-Date);
Rename-Item -Path $Working -NewName $newName;
}
finally
{
New-Item -ItemType Directory -Path $Working -ErrorAction SilentlyContinue;
Start-Service -Name 'Service1', 'Service2', 'Service3';
}
}
The reason for assigning $isThresholdExceeded
the way I am is because we don't care what the exact count of files is, just if it's above or below that threshold. As soon as we know that threshold has been exceeded we don't need any further results from Get-ChildItem
(or the same for [System.IO.Directory]::EnumerateFiles($Working)
), so as an opimization Select-Object
will terminate the pipeline on the element after the threshold is reached. In a directory with 100,000 files on an SSD I found this to be almost 40% faster than allowing Get-ChildItem
to enumerate all files (4.12 vs. 6.72 seconds). Other implementations using foreach
or ForEach-Object
proved to be slower than @(Get-ChildItem -File -Path $Working).Length
.
As for generating the new name for the 'Hold'
directories, you could save and update an identifier somewhere, or just generate new names with an incrementing suffix until you find one that's not in use. I think it's easier to just base the name on the current time. As long as the script doesn't run more than once a second you'll know the name is unique, they'll sort just as well as numerals, plus it gives you a little diagnostic information (the time that directory was rotated out) for free.
Here's some basic code for the second script:
$restoreThreshold = 50000;
$restoreBatchSize = 5000;
# Ensure there's enough room so that restoring $restoreBatchSize
# files won't push $Working's file count above $restoreThreshold
while (@(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
$holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter 'Hold_*' `
| Select-Object -First 1;
if ($holdDirectory -eq $null)
{
# There are no Hold directories to process; don't keep looping
break;
}
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
$restoredCount = Get-ChildItem -File -Path $holdDirectory.FullName `
| Select-Object -First $restoreBatchSize `
| Move-Item -Destination $Working -PassThru `
| Measure-Object `
| Select-Object -ExpandProperty 'Count';
# If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
if ($restoredCount -lt $restoreBatchSize)
{
Remove-Item -Path $holdDirectory.FullName;
}
}
As noted in the comment before the while
loop, the condition is ensuring that the count of files in $Working
is at least $restoreBatchSize
files away from $restoreThreshold
so that if $restoreBatchSize
files are restored it won't exceed the threshold in the process. If you don't care about that, or the chosen threshold already accounts for that, you change the condition to compare against $restoreThreshold
instead of $restoreThreshold - $restoreBatchSize
. Alternatively, leave the condition the same and change $restoreThreshold
to 55000
.
The way I've written the loop, on each iteration at most $restoreBatchSize
files will be restored from the first 'Hold_*'
directory it finds, then the file count in $Working
is reevaluated. Considering that, as I understand it, there are files being added and removed from $Working
external to this script and simultaneous to its execution, this might be the safest approach and also the simplest approach. You could certainly enhance this by calculating how far below $restoreThreshold
you are and performing the necessary number of batch restores, from one or more 'Hold_*'
directories, all in one iteration of the loop.