Search code examples
powershellforeachparallel-processingscriptblockforeach-object

PowerShell Make ForEach Loop Parallel


This is working code:

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
foreach ($id in $ids)
{ 
   $uriStr      = "http://192.168." + [String]$id + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$id] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$status

I would like to execute the ForEach loop in parallel to explore performance improvements.

First thing I tried (turned out naive) is to simply introduce the -parallel parameter

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
foreach -parallel ($id in $ids)
{ 
   $uriStr      = "http://192.168." + [String]$id + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$id] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$status

This results in the following error, suggesting this feature is still under consideration of development as of Powershell 7.3.9:

ParserError: 
Line |
   3 |  foreach -parallel ($id in $ids)
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The foreach '-parallel' parameter is reserved for future use.

I say naive because the documentation says the parallel parameter is only valid in a workflow. However, when I try it I get an error saying workflow is no longer supported.

workflow helloworld {Write-Host "Hello World"}
ParserError: 
Line |
   1 |  workflow helloworld {Write-Host "Hello World"}
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Workflow is not supported in PowerShell 6+.

Then I tried various combinations from various references (Good Example), which advise about ForEach being fundamentally different from from ForEach-Object, which supports parallel, like so (basically piping the ids in):

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
$ids | ForEach-Object -Parallel 
{ 
   $uriStr      = "http://192.168." + [String]$_ + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$status

This generates the following error:

ForEach-Object: 
Line |
   3 |  $ids | foreach-object -parallel
     |                        ~~~~~~~~~
     | Missing an argument for parameter 'Parallel'. Specify a parameter of type
     | 'System.Management.Automation.ScriptBlock' and try again.

   $uriStr      = "http://192.168." + [String]$_ + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$i_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}

But, after trying various script block semantics, here is the best I could do (basically apply :using to status variable that is outside the script block):

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
$myScriptBlock = 
{ 
   $uriStr      = "http://192.168." + [String]$_ + ".51/status"
   $uri         = [System.Uri] $uriStr
   {$using:status}[$_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$ids | foreach-object -parallel $myScriptBlock
$status

Error, again: Unable to index into Scriptblock

Line |
   4 |  … ng:status}[$_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
     |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Unable to index into an object of type "System.Management.Automation.ScriptBlock".
InvalidOperation: 

There are couple of other worthy to mention errors - if not applying the :using qualifier, get error

"cannot index into null array"

this basically means the $status variable is unrecognizable in the foreach or script block.

All other ways to express the :using qualifier are rejected with errors like

"assignment expression invalid" "use {}..."

so have been omitted for brevity and, better flow in problem statement. Lastly, here is a reference on SciptBlocks, for Powershell 7.3+ which have also been considered without much progress.


Solution

  • The following should work as intended (see the NOTE source-code comment below):

    $ids = 1..9 
    $status  = [PSCustomObject[]]::new(10)
    $ids | ForEach-Object -Parallel {  # NOTE: Opening { MUST be here.
       $uri = [System.Uri] "http://192.168.$_.51/status"
       # NOTE: (...) is required around the $using: reference.
       ($using:status)[$_] = try { Invoke-RestMethod -Uri $uri -TimeOut 30 } catch {}
    }
    $status
    

    Note: Since $_ is used as the array index ([$_]), the results for your 9 input IDs are stored in the array elements starting with the second one (whose index is 1), meaning that $status[0] will remain $null. Perhaps you meant to use 0..9.

    • You're using PowerShell (Core) 7+, in which PowerShell workflows aren't supported anymore; therefore, the foreach statement doesn't support -parallel there.

    • However, PowerShell 7+ does support -Parallel as a parameter of the ForEach-Object cmdlet[1] for multi-threaded execution.

      • As without -Parallel (i.e. with the (often positionally bound) -Process parameter), the script block ({ ... } you pass as an argument to the cmdlet does not use a self-chosen iterator variable the way that you do in a foreach statement (foreach ($id in $ids) ...), but rather receives its input from the pipeline and uses the automatic $_ variable to refer to the input object at hand, as shown above.

      • Because the ForEach-Object cmdlet is a type of command - as opposed to a language statement such as foreach (or an expression such as 'foo'.Length) - it is parsed in argument (parsing) mode:

        • A command must be specified on a single line, EXCEPT if:

          • explicitly indicated otherwise with a line continuation (placing a ` (the so-called backtick) at the very end of the line)

          • or the line is unambiguously syntactically incomplete and forces PowerShell to keep parsing for the end of the command on the next line.

        • Language statements (e.g., foreach and if) and expressions (e.g. .NET method calls), which are parsed in expression (parsing) mode, are generally not subject to this constraint.[2]

        • With a script-block argument, you can make a command multiline by using the syntactically-incomplete technique:

          • Placing its opening { only on the first line, allows you to place the block's content on subsequent lines, as shown above.
          • Note that the content of a script block is a new parsing context, in which the above rules apply again.
    • In order to apply an operation to a $using: reference (which accesses the value of a variable from the caller's scope) that sets a property or element identified by an index ([$_]), or gets a property value using an expression or an element using a non-literal index, or a method call, the $using: reference must be enclosed in (...), the grouping operator.

      • Arguably, this shouldn't be necessary, but is as of PowerShell 7.3.9 - see GitHub issue #10876 for a discussion.

      • As for your {$using:status}[$_] attempt: the {...} enclosure created a script block, which doesn't make sense here;[3] perhaps you meant to delimit the identifier part of the $using: reference, in which case the {...} enclosure goes after the $: ${using:status}; however, that (a) isn't necessary here, and (b) doesn't help the problem - (...) around the entire reference is needed either way.

    • A note on thread safety:

      • Because you're using an array to store your results, and because arrays are fixed-size data structures and you make each thread (runspace) target a dedicated element of your array, there is no need to manage concurrent access explicitly.

      • More typically, however, with variable-size data structures and/or in cases where multiple threads may access the same element, managing concurrency is necessary.

      • An alternative to filling a data structure provided by the caller is to simply make the script block output results, which the caller can collect; however, unless this output also identifies the corresponding input object, this correspondence is then lost.

      • This answer elaborates on the last two points (thread-safe data structures vs. outputting results).


    [1] Somewhat confusingly, ForEach-Object has an alias also named foreach. It is the syntactic context (the parsing mode) that determines in a given statement whether foreach refers to the foreach (language) statement or the ForEach-Object cmdlet; e.g. foreach ($i in 1..3) { $i } (statement) vs. 1..3 | foreach { $_ } (cmdlet).

    [2] However, if an expression is syntactically complete on a given line, PowerShell also stops parsing, which amounts to a notable pitfall with ., the member-access operator: Unlike in C#, for instance, . must be placed on the same line as the object / expression it is applied to. E.g. 'foo'.<newline> Length works, but 'foo'<newline> .Length does not. Additionally, the . must immediately follow the target object / expression even on a single line (e.g. 'foo' .Length breaks too)

    [3] Due to PowerShell's unified handling of list-like collections and scalars (single objects) - see this answer - indexing into a script block technically works with getting a value: indices [0] and [-1] return the scalar itself (e.g. $var = 42; $var[0]), all other indices return $null by default, but cause an error if Set-StrictMode -Version 3 or higher is in effect; however, an attempt to assign a value categorically fails (e.g. $var = 42; $var[0] = 43)