Search code examples
powershellcollectionsvariable-assignmentoperator-keyword

Why should I avoid using the increase assignment operator (+=) to create a collection


The increase assignment operator (+=) is often used in PowerShell questions and answers at the StackOverflow site to construct a collection objects, e.g.:

$Collection = @()
1..$Size | ForEach-Object {
    $Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}

Yet it appears an very inefficient operation.

Is it Ok to generally state that the increase assignment operator (+=) should be avoided for building an object collection in PowerShell?


Solution

  • Note: 2024-08-30

    A new version of PowerShell (see v7.5.0-preview.4) which has a major improvement towards this issue is about to be come out. Although it is probably still recommended to use explicit assignment, the benchmarks in this answer are outdated.
    For more details see the helpful addendum from Santiago Squarzon.

    Yes, the increase assignment operator (+=) should be avoided for building an object collection. (see also: PowerShell scripting performance considerations).
    Apart from the fact:

    • that using the += operator usually requires more statements (because of the array initialization = @()), and
    • it encourages to store the whole collection in memory rather then push it immediately into the pipeline

    ... it is inefficient.

    The reason it is inefficient is because every time you use the += operator, it will just do:

    $Collection = $Collection + $NewObject
    

    Because arrays are immutable in terms of element count, the whole (growing) collection will be recreated with every iteration.

    The correct PowerShell syntax is:

    $Collection = 1..$Size | ForEach-Object {
        [PSCustomObject]@{Index = $_; Name = "Name$_"}
    }
    

    Note: as with other cmdlets; if there is just one item (iteration), the output will be a scalar and not an array, to force it to an array, you might either us the [Array] type: [Array]$Collection = 1..$Size | ForEach-Object { ... } or use the Array subexpression operator @( ): $Collection = @(1..$Size | ForEach-Object { ... })

    Where it is recommended to not even store the results in a variable ($a = ...) but immediately pass it into the pipeline to save memory, e.g.:

    1..$Size | ForEach-Object {
        [PSCustomObject]@{Index = $_; Name = "Name$_"}
    } | ConvertTo-Csv .\Outfile.csv
    

    Note: Using the System.Collections.ArrayList class could also be considered, this is generally almost as fast as the PowerShell pipeline but the disadvantage is that it consumes a lot more memory than (properly) using the PowerShell pipeline.

    see also: Fastest Way to get a uniquely index item from the property of an array and Array causing 'system.outofmemoryexception'

    Performance measurement

    To show the relation with the collection size and the decrease of performance you might check the following test results:

    1..20 | ForEach-Object {
        $size = 1000 * $_
        $Performance = @{Size = $Size}
        $Performance.Pipeline = (Measure-Command {
            $Collection = 1..$Size | ForEach-Object {
                [PSCustomObject]@{Index = $_; Name = "Name$_"}
            }
        }).Ticks
        $Performance.Increase = (Measure-Command {
            $Collection = @()
            1..$Size | ForEach-Object {
                $Collection  += [PSCustomObject]@{Index = $_; Name = "Name$_"}
            }
        }).Ticks
        [pscustomobject]$Performance
    } | Format-Table *,@{n='Factor'; e={$_.Increase / $_.Pipeline}; f='0.00'} -AutoSize
    
     Size  Increase Pipeline Factor
     ----  -------- -------- ------
     1000   1554066   780590   1.99
     2000   4673757  1084784   4.31
     3000  10419550  1381980   7.54
     4000  14475594  1904888   7.60
     5000  23334748  2752994   8.48
     6000  39117141  4202091   9.31
     7000  52893014  3683966  14.36
     8000  64109493  6253385  10.25
     9000  88694413  4604167  19.26
    10000 104747469  5158362  20.31
    11000 126997771  6232390  20.38
    12000 148529243  6317454  23.51
    13000 190501251  6929375  27.49
    14000 209396947  9121921  22.96
    15000 244751222  8598125  28.47
    16000 286846454  8936873  32.10
    17000 323833173  9278078  34.90
    18000 376521440 12602889  29.88
    19000 422228695 16610650  25.42
    20000 475496288 11516165  41.29
    

    Meaning that with a collection size of 20,000 objects using the += operator is about 40x slower than using the PowerShell pipeline for this.

    Steps to correct a script

    Apparently some people struggle with correcting a script that already uses the increase assignment operator (+=). Therefore, I have created a little instruction to do so:

    Note:
    Changing the array initialization <Variable> = @() to something like: <Variable> = [Collections.Generic.List[Object]]::new() is not enough to resolve this performance issue because as soon as you use the + or += operator on it, it will be changed back to an immutable array type.

    1. Remove all the <variable> += assignments from the concerned iteration, just leave only the object item. By not assigning the object, the object will simply be put on the pipeline.
      It doesn't matter if there are multiple increase assignments in the iteration or if there are embedded iterations or function, the end result will be the same.
      Meaning, this:

     

    ForEach ( ... ) {
        $Array += $Object1
        $Array += $Object2
        ForEach ( ... ) {
            $Array += $Object3
            $Array += Get-Object
    
        }
    }
    

    Is essentially the same as:

    ForEach ( ... ) {
        $Object1
        $Object2
        ForEach ( ... ) {
            $Object3
            Get-Object
    
        }
    }
    

    Note: if there is no iteration, there is probably no reason to change your script as likely only concerns a few additions

    1. Assign the output of the iteration (everything that is put on the pipeline) to the concerned a variable. This is usually at the same level as where the array was initialized ($Array = @()). e.g.:

     

    $Array = ForEach ( ... ) { ...
    

    Note 1: Again, if you want single object to act as an array, you probably want to use the Array subexpression operator @( ) but you might also consider to do this at the moment you use the array, like: @($Array).Count or ForEach ($Item in @($Array))
    Note 2: Again, you're better off not assigning the output at all. Instead, pass the pipeline output directly to the next cmdlet to free up memory: ... | ForEach-Object {...} | Export-Csv .\File.csv.

    1. Remove the array initialization <Variable> = @()

    For a full example, see:

    Note that the same applies for using += to build strings ( see: Is there a string concatenation shortcut in PowerShell?) and also building HashTables like: $HashTable += @{ $NewName = $Value }