Search code examples
powershellpowershell-5.1file-monitoring

PowerShell - How to find which paths a given command will execute against?


Problem

I have the command Get-Content -Path $SearchPath -Wait | Select-String -Pattern $SearchTerm which is used to search through debug file(s) for common search terms and display them live to the user.

The $SearchPath & $SearchTerm are configured from user inputs.

I would like to list out the files that will be searched to the user before starting the search.


What I have tried

Attempt #1

Get-ChildItem -Path $SearchPath | Select-Object -ExpandProperty FullName

The problem with this command is that it lists both files & directories. However, only files will be searched in the Get-Content command.

Attempt #2

Since I know that all my files end with .log, I added in a regex check for this to eliminate the directories, like so:

Get-ChildItem -Path $SearchPath | Where-Object { $_.Name -match [regex]'([a-zA-Z0-9])*.log' } | Select-Object -ExpandProperty FullName

The problem with this command is that if $SearchPath is C:\Path\To\The\* it will list only the files in that directory, but not subdirectories. However, the original Get-Content command will check subdirectories.

E.g. It would list C:\Path\To\The\DebugLog.log, but not C:\Path\To\The\SubDir\DebugLog.log, even though Get-Content would search it.

Attempt #3

Okay, so my next thought is that I should add -Recurse to check further directories, like so:

Get-ChildItem -Path $SearchPath -Recurse | Where-Object { $_.Name -match [regex]'([a-zA-Z0-9])*.log' } | Select-Object -ExpandProperty FullName

This solves the first problem, but now if $SearchPath is C:\Path\To\The\DebugLog.log, it will list both C:\Path\To\The\DebugLog.log & C:\Path\To\The\Subfolder\DebugLog.log. But the original Get-Content command will only search C:\Path\To\The\DebugLog.log.


Question

How can I check which files the Get-Content command will search before executing it?

Is Get-ChildItem the right command for this, or is there a better option?


Solution

  • Note that with $SearchPath = 'C:\Path\To\The\*', Get-Content -Path $SearchPath will not search subdirectories. Passing a directory path to Get-Content results in an error.

    • To limit retrieval of file-system items to files, use the -File switch.

    • To limit recursive retrieval to a specified depth, use the -Depth parameter (which implies the -Recurse switch)

    Therefore, in order to retrieve all *.log files in the current directory and its immediate subdirectories, use the following:

    $logFilesOfInterest =
      Get-ChildItem -File -Depth 1 -Filter *.log 
    

    You can then pass these files to your Get-Content pipeline, using either of the following approaches:

    # Either: Use the pipeline.
    $logFilesOfInterest | Get-Content -Wait | Select-String -Pattern $SearchTerm
    
    # Or: Pass them as *arguments*.
    Get-Content -LiteralPath $logFilesOfInterest.FullName -Wait | 
      Select-String -Pattern $SearchTerm
    

    However, in neither case will -Wait work as you intend:

    • Get-Content's -Wait switch effectively only supports one input file: it reads the first one given and then indefinitely waits for it to have more data appended to it, without reading the other files; it only moves on to the next file if the file currently being monitored happens to get deleted.

    To monitor multiple files simultaneously, you'll need to use a form of parallelism, which PowerShell (Core) 7+ can provide via the -Parallel parameter of ForEach-Object:

    # PSv7+ only.
    
    $logFilesOfInterest =
      Get-ChildItem -File -Depth 1 -Filter *.log 
    
    $searchTerm = '.' # Specify your search term here.
    
    $logFilesOfInterest |
      ForEach-Object -ThrottleLimit $logFilesOfInterest.Count -Parallel {
        $_ | Get-Content -Wait | Select-String -Pattern $using:searchTerm 
      }
    
    • -ThrottleLimit $logFilesOfInterest.Count ensures that a thread is spun up for every file to be monitored right away.

    • $using:SearchTerm is required in order to refer to the $SearchTerm variable in the caller's scope.

    • Use Ctrl-C to stop monitoring.


    In Windows PowerShell, you'll have to use background jobs, which is more cumbersome:

    $logFilesOfInterest =
      Get-ChildItem -File -Depth 1 -Filter *.log 
    
    $searchTerm = '.' # Specify your search term here.
    
    $logFilesOfInterest |
      ForEach-Object -OutVariable jobs {
        Start-Job {
          Get-Content -LiteralPath ($using:_).FullName -Wait | 
            Select-String -Pattern $using:searchTerm |
            ForEach-Object Line
        }
      } |
      Receive-Job -Wait
    
    • Use Ctrl-C to stop monitoring.

    • After doing so, you can clean up the jobs with $jobs | Remove-Job -Force (the $jobs variable containing all launched jobs was created with -OutVariable jobs)

    • Note the use of ForEach-Object Line, which is necessary to work around a bug in Windows PowerShell, which causes direct Select-String output not to surface via Receive-Job (PowerShell (Core) no longer has this problem.)

    • As an aside (it won't make much difference here):

      • The Start-ThreadJob cmdlet offers a lightweight, much faster thread-based alternative to the child-process-based regular background jobs created with Start-Job. It comes with PowerShell (Core) 7+ and in Windows PowerShell can be installed on demand with, e.g., Install-Module ThreadJob -Scope CurrentUser. In most cases, thread jobs are the better choice, both for performance and type fidelity - see the bottom section of this answer for why.