Search code examples
powershellperformanceloopsreturncontinue

Using continue/return statement inside .ForEach() method - is it better to use foreach ($item in $collection) instead?


It's fairly well documented that foreach processing speed varies depending on the way a foreach loop is carried out (ordered from fastest to slowest):

  1. .ForEach() method
  2. foreach ($item in $collection) {}
  3. $collection | ForEach-Object {}


  • When working with (very) large collections, speed comparisons between #1 and #2 can be significant, and the overhead of piping makes #3 a non-candidate in those cases.
  • #2 offers the continue statement, while #1 does not
    • Please correct/comment if this is inaccurate
  • From what I've seen online and in real life, return is how to "continue" when using the .ForEach() method.



My questions:

  1. When the speed advantage of the .ForEach() method is too big to settle for foreach and you need to continue, what is the proper way to continue when using .ForEach({})?
  2. What are the implications or gotcha's you should be aware of when using return inside the .ForEach() method?

Solution

  • ...and the overhead of piping makes #3 a non-candidate in those cases.

    Incorrect, the pipeline is very efficient, it's almost pair with foreach (fastest way in PowerShell to enumerate a collection). ForEach-Object is the inefficient one because it dot sources the scriptblock preventing local optimizations.


    If you want to go deep, ForEach-Object calls ScriptBlock.InvokeWithCmdlet with the useLocalScope argument set to false (meaning, dot sourced in the PowerShell world):

            private void ProcessScriptBlockParameterSet()
            {
                for (int i = _start; i < _end; i++)
                {
                    // Only execute scripts that aren't null. This isn't treated as an error
                    // because it allows you to parameterize a command - for example you might allow
                    // for actions before and after the main processing script. They could be null
                    // by default and therefore ignored then filled in later...
                    _scripts[i]?.InvokeUsingCmdlet(
                        contextCmdlet: this,
                        useLocalScope: false,
                        errorHandlingBehavior: ScriptBlock.ErrorHandlingBehavior.WriteToCurrentErrorPipe,
                        dollarUnder: InputObject,
                        input: new object[] { InputObject },
                        scriptThis: AutomationNull.Value,
                        args: Array.Empty<object>());
                }
            }
    

    This has been discussed in several Github issues, perhaps the most relevant one is #10982.


    .ForEach is almost never a good a alternative, tests below clearly show that. In addition the output type is always a Collection<T>:

    ''.ForEach({ }).GetType()
    
    #   Namespace: System.Collections.ObjectModel
    #
    # Access        Modifiers           Name
    # ------        ---------           ----
    # public        class               Collection<PSObject>...
    

    .ForEach doesn't stream output, meaning that there is no way to exit early from the loop with Select-Object -First, this also means higher memory consumption.

    Measure-Command {
        (0..10).ForEach({ $_; Start-Sleep -Milliseconds 200 }) | Select-Object -First 1
    } | ForEach-Object TotalSeconds
    
    # 2.2637483
    

    As for the 2nd question, return is the closest you can get to continue exiting early from the invocation, there are no gotchas there as long it is understood that it exits early from the current invocation and goes to the next item in the collection, however there is no real way to break the loop using .ForEach.

    I believe it's already understood but, break and continue should not be used outside of outside of a loop, switch, or trap:

    & {
        ''.ForEach({ break })
        'will never get here'
    }
    
    'or here'
    

    If you're looking for performance you should rely on foreach or a scriptblock with a process block or function with a process block.

    $range = [System.Linq.Enumerable]::Range(0, 1mb)
    $tests = @{
        'foreach' = {
            foreach ($i in $args[0]) { $i }
        }
        '.ForEach()' = {
            $args[0].ForEach({ $_ })
        }
        'ForEach-Object' = {
            $args[0] | ForEach-Object { $_ }
        }
        'process block' = {
            $args[0] | & { process { $_ } }
        }
    }
    
    $tests.GetEnumerator() | ForEach-Object {
        [pscustomobject]@{
            Test = $_.Key
            Time = (Measure-Command { & $_.Value $range }).TotalMilliseconds
        }
    } | Sort-Object Time
    
    # Test              Time
    # ----              ----
    # foreach         103.96
    # process block   918.04
    # .ForEach()     3614.44
    # ForEach-Object 9046.14