Search code examples
powershellparallel-processingfilesystemwatcher

Multiple io.filesystemwatchers in parallel


I have three different tasks that I wish to outsource to filesystem watchers in powershell. I have the code all set up to initialize two watchers and to check every ten seconds to make sure they are running. However the tasks that they perform last under a minute, and 5 minutes respectively. The third task I wish to outsource to a watcher takes about an hour. I am concerned that if I have all of them running simultaneously, tasks that the first two should watch for will not get done at all if the third watcher is executing its change action. Is there a way to implement or run them such that the change actions can be executed in parallel?


Solution

  • You can use the Start-ThreadJob cmdlet to run your file-watching tasks in parallel.

    • Start-ThreadJob comes with the ThreadJob module and offers a lightweight, thread-based alternative to the child-process-based regular background jobs. It comes with PowerShell [Core] v6+ and in Windows PowerShell can be installed on demand with, e.g., Install-Module ThreadJob -Scope CurrentUser. In most cases, thread jobs are the better choice, both for performance and type fidelity - see the bottom section of this answer for why.

    The following self-contained sample code:

    • uses thread jobs to run 2 distinct file-monitoring and processing tasks in parallel,
    • which neither block each other nor the caller.

    Note:

    • Each task creates its own System.IO.FileSystemWatcher instance in the code below, though creating too many of them can put a significant load on the system, possibly resulting in events getting missed. An alternative is to share instances, such as creating a single one in the caller's context, which the thread jobs can access (see comments in source code below).

    • [This is in part speculative; do tell us if I got things wrong] Direct FileSystemWatcher .NET event-handler delegates should be kept short, but subscribing to the events from PowerShell via an event job created by Register-ObjectEvent queues the events on the PowerShell side, which PowerShell then dispatches to the -Action script blocks, so that these blocks perform long-running operations below shouldn't be an immediate concern (the tasks may take a long time to catch up, though).

    # Make sure that the ThreadJob module is available.
    # In Windows PowerShell, it must be installed first.
    # In PowerShell [Core], it is available by default.
    Import-Module ThreadJob -ea Stop
    
    try {
    
      # Use the system's temp folder in this example.
      $dir = (Get-Item -EA Ignore temp:).FullName; if (-not $dir) { $dir = $env:TEMP }
    
      # Define the tasks as an array of custom objects that specify the dir.
      # and file name pattern to monitor as well as the action script block to 
      # handle the events.
      $tasks = # array of custom objects to describe the 
        [pscustomobject] @{
          DirToMonitor = $dir
          FileNamePattern = '*.tmp1'
          Action = {
            # Print status info containing the event data to the host, synchronously.
            Write-Host -NoNewLine "`nINFO: Event 1 raised:`n$($EventArgs | Format-List | Out-String)"
            # Sleep to simulate blocking the thread with a long-running  task.
            Write-Host "INFO: Event 1: Working for 4 secs."
            Start-Sleep 4
            # Create output, which Receive-Job can collect.
            "`nEvent 1 output: " + $EventArgs.Name
          }
        },
        [pscustomobject] @{
          DirToMonitor = $dir
          FileNamePattern = '*.tmp2'
          Action = {
            # Print status info containing the event data to the host, synchronously
            Write-Host -NoNewLine "`nINFO: Event 2 raised:`n$($EventArgs | Format-List | Out-String)"
            # Sleep to simulate blocking the thread with a long-running  task.
            Write-Host "INFO: Event 2: Working for 2 secs"
            Start-Sleep 2
            # Create output, which Receive-Job can collect.
            "`nEvent 2 output: " + $EventArgs.Name
          }  
        } 
    
      # Start a separate thread job for each action task.
      $threadJobs = $tasks | ForEach-Object {
    
        Start-ThreadJob -ArgumentList $_ {
    
          param([pscustomobject] $task)
    
          # Create and initialize a thread-specific watcher.
          # Note: To keep system load low, it's generally better to use a *shared* 
          #       watcher, if feasible. You can define it in the caller's scope
          #       and access here via $using:watcher
          $watcher = [System.IO.FileSystemWatcher] [ordered] @{
            Path   = $task.DirToMonitor
            Filter = $task.FileNamePattern
            EnableRaisingEvents = $true # start watching.
          }
    
          # Subscribe to the watcher's Created events, which returns an event job.
          # This indefinitely running job receives the output from the -Action script
          # block whenever the latter is called after an event fires.
          $eventJob = Register-ObjectEvent -ea stop $watcher Created -Action $task.Action
    
          Write-Host "`nINFO: Watching $($task.DirToMonitor) for creation of $($task.FileNamePattern) files..."
    
          # Indefinitely wait for output from the action blocks and relay it.
          try {
            while ($true) {
              Receive-Job $eventJob
              Start-Sleep -Milliseconds 500  # sleep a little
            }
          }
          finally { 
             # !! This doesn't print, presumably because this is killed by the
             # !! *caller* being killed, which then doesn't relay the output anymore.
            Write-Host "Cleaning up thread for task $($task.FileNamePattern)..."
            # Dispose of the watcher.
            $watcher.Dispose()
            # Remove the event job (and with it the event subscription).
            $eventJob | Remove-Job -Force 
          }
    
        }
    
      }  
    
      $sampleFilesCreated = $false
      $sampleFiles = foreach ($task in $tasks) { Join-Path $task.DirToMonitor ("tmp_$PID" + ($task.FileNamePattern -replace '\*')) }
    
      Write-Host "Starting tasks...`nUse Ctrl-C to stop."
    
      # Indefinitely wait for and display output from the thread jobs.
      # Use Ctrl+C to stop.
      $dtStart = [datetime]::UtcNow
      while ($true) {
    
        # Receive thread job output, if any.
        $threadJobs | Receive-Job
    
        # Sleep a little.
        Write-Host . -NoNewline
        Start-Sleep -Milliseconds 500
    
        # A good while after startup, create sample files that trigger all tasks.
        # NOTE: The delay must be long enough for the task event handlers to already be
        #       in place. How long that takes can vary.
        #       Watch the status output to make sure the files are created
        #       *after* the event handlers became active.
        #       If not, increase the delay or create files manually once
        #       the event handlers are in place.
        if (-not $sampleFilesCreated -and ([datetime]::UtcNow - $dtStart).TotalSeconds -ge 10) {
          Write-Host
          foreach ($sampleFile in $sampleFiles) {
            Write-Host "INFO: Creating sample file $sampleFile..."
            $null > $sampleFile
          }
          $sampleFilesCreated = $true
        }
    
      }
    
    }
    finally {
      # Clean up.
      # Clean up the thread jobs.
      Remove-Job -Force $threadJobs
      # Remove the temp. sample files
      Remove-Item -ea Ignore $sampleFiles
    }
    

    The above creates output such as the following (sample from a macOS machine):

    Starting tasks...
    Use Ctrl-C to stop.
    .
    INFO: Watching /var/folders/19/0lxcl7hd63d6fqd813glqppc0000gn/T/ for creation of *.tmp1 files...
    
    INFO: Watching /var/folders/19/0lxcl7hd63d6fqd813glqppc0000gn/T/ for creation of *.tmp2 files...
    .........
    INFO: Creating sample file /var/folders/19/0lxcl7hd63d6fqd813glqppc0000gn/T/tmp_91418.tmp1...
    INFO: Creating sample file /var/folders/19/0lxcl7hd63d6fqd813glqppc0000gn/T/tmp_91418.tmp2...
    .
    INFO: Event 1 raised:
    
    ChangeType : Created
    FullPath   : /var/folders/19/0lxcl7hd63d6fqd813glqppc0000gn/T/tmp_91418.tmp1
    Name       : tmp_91418.tmp1
    
    
    INFO: Event 1: Working for 4 secs.
    
    INFO: Event 2 raised:
    
    ChangeType : Created
    FullPath   : /var/folders/19/0lxcl7hd63d6fqd813glqppc0000gn/T/tmp_91418.tmp2
    Name       : tmp_91418.tmp2
    
    
    INFO: Event 2: Working for 2 secs
    ....
    Event 2 output: tmp_91418.tmp2
    ....
    Event 1 output: tmp_91418.tmp1
    .................