Search code examples
powershellif-statementmove

PowerShell script to execute if threshold exceeded


First off, sorry for the long post - I'm trying to be detailed!

I'm looking to automate a work around for an issue I discovered. I have a worker that periodically bombs once the "working" directory has more than 100,000 files in it. Preventatively I can stop the process and rename the working directory to "HOLD" and create new working dir to keep it going. Then I move files from the HOLD folder(s) back into the working dir a little bit at a time until its caught up.

What I would like to do is automate the entire process via Task Scheduler with 2 PowerShell scripts.

----SCRIPT 1----

Here's the condition:

  • If file count in working dir is greater than 60,000

I find that( [System.IO.Directory]::EnumerateFiles($Working)is faster than Get-ChildItem.

The actions:

  • Stop-Service for Service1, Service2, Service3
  • Rename-Item -Path "C:\Prod\Working\" -NewName "Hold" or "Hold1","2","3",etc.. if the folder already exists --I'm not particular about the numeration as long as it is consistent so if it's easier to let the system name it HOLD, HOLD(1), HOLD(2), etc.. or append the date after HOLD then that's fine.
  • New-Item C:\Prod\Working -type directory
  • Start-Service Service1, Service2, Service3

---SCRIPT 2----

Condition:

  • If file count in working dir is less than 50,000

Actions:

  • Move 5,000 files from HOLD* folder(s) --Move 5k files from the HOLD folder until empty, then skip the empty folder and start moving files from HOLD1. This process should be dynamic and repeat to the next folders.

Before it comes up, I'm well aware it would be easier to simply move the files from the working folder to a Hold folder, but the size of the files can be very large and moving them always seems to take much longer.

I greatly appreciate any input and I'm eager to see some solid answers!

EDIT

Here's what I'm running for Script 2 -courtesy of Bacon

#Setup
$restoreThreshold = 30000;  # Ensure there's enough room so that restoring $restoreBatchSize
$restoreBatchSize = 500;   # files won't push $Working's file count above $restoreThreshold
$Working = "E:\UnprocessedTEST\"
$HoldBaseDirectory = "E:\"

while (@(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
    $holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter '*Hold*' | 
    Select-Object -Last 1;
               
    if ($holdDirectory -eq $null)
    {
        # There are no Hold directories to process; don't keep looping
        break;
    }
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
    $restoredCount = Get-ChildItem $holdDirectory -File `
    | Select-Object -First $restoreBatchSize | Move-Item -Destination $Working -PassThru | 
     Measure-Object | Select-Object -ExpandProperty 'Count';

   # If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
    if ($restoredCount -lt $restoreBatchSize)
    {
        Remove-Item -Path $holdDirectory; 
                                           }
}


Solution

  • The first script could look like this:

    $rotateThreshold = 60000;
    $isThresholdExceeded = @(
        Get-ChildItem -File -Path $Working `
            | Select-Object -First ($rotateThreshold + 1) `
    ).Length -gt $rotateThreshold;
    #Alternative: $isThresholdExceeded = @(Get-ChildItem -File -Path $Working).Length -gt $rotateThreshold;
    
    if ($isThresholdExceeded)
    {
        Stop-Service -Name 'Service1', 'Service2', 'Service3';
    
        try
        {
            $newName = 'Hold_{0:yyyy-MM-ddTHH-mm-ss}' -f (Get-Date);
    
            Rename-Item -Path $Working -NewName $newName;
        }
        finally
        {
            New-Item -ItemType Directory -Path $Working -ErrorAction SilentlyContinue;
            Start-Service -Name 'Service1', 'Service2', 'Service3';
        }
    }
    

    The reason for assigning $isThresholdExceeded the way I am is because we don't care what the exact count of files is, just if it's above or below that threshold. As soon as we know that threshold has been exceeded we don't need any further results from Get-ChildItem (or the same for [System.IO.Directory]::EnumerateFiles($Working)), so as an opimization Select-Object will terminate the pipeline on the element after the threshold is reached. In a directory with 100,000 files on an SSD I found this to be almost 40% faster than allowing Get-ChildItem to enumerate all files (4.12 vs. 6.72 seconds). Other implementations using foreach or ForEach-Object proved to be slower than @(Get-ChildItem -File -Path $Working).Length.

    As for generating the new name for the 'Hold' directories, you could save and update an identifier somewhere, or just generate new names with an incrementing suffix until you find one that's not in use. I think it's easier to just base the name on the current time. As long as the script doesn't run more than once a second you'll know the name is unique, they'll sort just as well as numerals, plus it gives you a little diagnostic information (the time that directory was rotated out) for free.

    Here's some basic code for the second script:

    $restoreThreshold = 50000;
    $restoreBatchSize = 5000;
    
    # Ensure there's enough room so that restoring $restoreBatchSize
    # files won't push $Working's file count above $restoreThreshold
    while (@(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
    {
        $holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter 'Hold_*' `
            | Select-Object -First 1;
    
        if ($holdDirectory -eq $null)
        {
            # There are no Hold directories to process; don't keep looping
            break;
        }
    
        # Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
        $restoredCount = Get-ChildItem -File -Path $holdDirectory.FullName `
            | Select-Object -First $restoreBatchSize `
            | Move-Item -Destination $Working -PassThru `
            | Measure-Object `
            | Select-Object -ExpandProperty 'Count';
    
        # If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
        if ($restoredCount -lt $restoreBatchSize)
        {
            Remove-Item -Path $holdDirectory.FullName;
        }
    }
    

    As noted in the comment before the while loop, the condition is ensuring that the count of files in $Working is at least $restoreBatchSize files away from $restoreThreshold so that if $restoreBatchSize files are restored it won't exceed the threshold in the process. If you don't care about that, or the chosen threshold already accounts for that, you change the condition to compare against $restoreThreshold instead of $restoreThreshold - $restoreBatchSize. Alternatively, leave the condition the same and change $restoreThreshold to 55000.

    The way I've written the loop, on each iteration at most $restoreBatchSize files will be restored from the first 'Hold_*' directory it finds, then the file count in $Working is reevaluated. Considering that, as I understand it, there are files being added and removed from $Working external to this script and simultaneous to its execution, this might be the safest approach and also the simplest approach. You could certainly enhance this by calculating how far below $restoreThreshold you are and performing the necessary number of batch restores, from one or more 'Hold_*' directories, all in one iteration of the loop.